Also, keep in mind that, when you open a block for reading, the DN
immediately starts writing the entire block (assuming it's requested via the
xceiver protocol) - it's TCP backpressure on the send window that does flow
control there. So, although it's not explicitly reading ahead, most of the
reads on DFSInputStream should be coming from the TCP receive buffer, not
making round trips.

At one point a few weeks ago I did hack explicit readahead around
DFSInputStream and didn't see an appreciable difference. I didn't spend much
time on it, though, so I may have screwed something up - wasn't a scientific
test.

-Todd

On Tue, Nov 24, 2009 at 10:02 AM, Eli Collins <[email protected]> wrote:

> Hey Martin,
>
> It would be an interesting experiment but I'm not sure it would
> improve things as the host (and hardware to some extent) are already
> reading ahead. A useful exercise would be to evaluate whether the new
> default host parameters for on-demand readahead are suitable for
> hadoop.
>
> http://lwn.net/Articles/235164
> http://lwn.net/Articles/235181
>
> Thanks,
> Eli
>
> On Mon, Nov 23, 2009 at 11:23 PM, Martin Mituzas <[email protected]>
> wrote:
> >
> > I read the code and find the call
> > DFSInputStream.read(buf, off, len)
> > will cause the DataNode read len bytes (or less if encounting the end of
> > block) , why does not hdfs read ahead to improve performance for
> sequential
> > read?
> > --
> > View this message in context:
> http://old.nabble.com/why-does-not-hdfs-read-ahead---tp26491449p26491449.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
>

Reply via email to