Sequential read is the simplest case and it is pretty hard to improve upon
the current raw performance (HDFS client does take more CPU than one might
expect, Todd implemented an improvement for CPU consumed).

Just to reiterate what Todd said, there is an implicit read ahead for
sequential reads with TCP buffers and kernel read ahead on Datanodes.

If you extend the read ahead buffer to be more of a buffer cache for the
block, it could have big impact for some read access patterns (e.g. binary
search).

Raghu.

On Mon, Nov 23, 2009 at 11:23 PM, Martin Mituzas <[email protected]>wrote:

>
> I read the code and find the call
> DFSInputStream.read(buf, off, len)
> will cause the DataNode read len bytes (or less if encounting the end of
> block) , why does not hdfs read ahead to improve performance for sequential
> read?
> --
> View this message in context:
> http://old.nabble.com/why-does-not-hdfs-read-ahead---tp26491449p26491449.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Reply via email to