Re: why does not hdfs read ahead ?

Todd Lipcon Tue, 24 Nov 2009 10:37:05 -0800

On Tue, Nov 24, 2009 at 10:33 AM, Brian Bockelman <[email protected]>wrote:


>
> On Nov 24, 2009, at 12:06 PM, Todd Lipcon wrote:
>
> > Also, keep in mind that, when you open a block for reading, the DN
> > immediately starts writing the entire block (assuming it's requested via
> the
> > xceiver protocol) - it's TCP backpressure on the send window that does
> flow
> > control there.
>
> Ok, that's a pretty freakin' cool idea.  Is it well-documented how this
> technique works?  How does this affect folks (me) who use the pread
> interface?
>

AFAIK using pread sends the actual length with the OP_READ_BLOCK command, so
it doesn't read ahead past what you ask for. The awful thing about pread is
that it actually makes a new datanode connection for every read - including
the TCP handshake round trip, thread setup/teardown, etc.


>
> > So, although it's not explicitly reading ahead, most of the
> > reads on DFSInputStream should be coming from the TCP receive buffer, not
> > making round trips.
> >
> > At one point a few weeks ago I did hack explicit readahead around
> > DFSInputStream and didn't see an appreciable difference. I didn't spend
> much
> > time on it, though, so I may have screwed something up - wasn't a
> scientific
> > test.
> >
>
> Speaking from someone who's worked with storage systems that do an explicit
> readahead, this can turn out to be a big giant disaster if it's combined
> with random reads.
>
> Big disaster as far as application-level throughput goes - but does make
> for impressive ganglia graphs!
>
> Brian
>
> > -Todd
> >
> > On Tue, Nov 24, 2009 at 10:02 AM, Eli Collins <[email protected]> wrote:
> >
> >> Hey Martin,
> >>
> >> It would be an interesting experiment but I'm not sure it would
> >> improve things as the host (and hardware to some extent) are already
> >> reading ahead. A useful exercise would be to evaluate whether the new
> >> default host parameters for on-demand readahead are suitable for
> >> hadoop.
> >>
> >> http://lwn.net/Articles/235164
> >> http://lwn.net/Articles/235181
> >>
> >> Thanks,
> >> Eli
> >>
> >> On Mon, Nov 23, 2009 at 11:23 PM, Martin Mituzas <
> [email protected]>
> >> wrote:
> >>>
> >>> I read the code and find the call
> >>> DFSInputStream.read(buf, off, len)
> >>> will cause the DataNode read len bytes (or less if encounting the end
> of
> >>> block) , why does not hdfs read ahead to improve performance for
> >> sequential
> >>> read?
> >>> --
> >>> View this message in context:
> >>
> http://old.nabble.com/why-does-not-hdfs-read-ahead---tp26491449p26491449.html
> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>>
> >>>
> >>
>
>

Re: why does not hdfs read ahead ?

Reply via email to