Also, keep in mind that, when you open a block for reading, the DN immediately starts writing the entire block (assuming it's requested via the xceiver protocol) - it's TCP backpressure on the send window that does flow control there. So, although it's not explicitly reading ahead, most of the reads on DFSInputStream should be coming from the TCP receive buffer, not making round trips.
At one point a few weeks ago I did hack explicit readahead around DFSInputStream and didn't see an appreciable difference. I didn't spend much time on it, though, so I may have screwed something up - wasn't a scientific test. -Todd On Tue, Nov 24, 2009 at 10:02 AM, Eli Collins <[email protected]> wrote: > Hey Martin, > > It would be an interesting experiment but I'm not sure it would > improve things as the host (and hardware to some extent) are already > reading ahead. A useful exercise would be to evaluate whether the new > default host parameters for on-demand readahead are suitable for > hadoop. > > http://lwn.net/Articles/235164 > http://lwn.net/Articles/235181 > > Thanks, > Eli > > On Mon, Nov 23, 2009 at 11:23 PM, Martin Mituzas <[email protected]> > wrote: > > > > I read the code and find the call > > DFSInputStream.read(buf, off, len) > > will cause the DataNode read len bytes (or less if encounting the end of > > block) , why does not hdfs read ahead to improve performance for > sequential > > read? > > -- > > View this message in context: > http://old.nabble.com/why-does-not-hdfs-read-ahead---tp26491449p26491449.html > > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > > >
