On Mon, Dec 6, 2010 at 3:13 PM, Rajappa Iyer <[email protected]> wrote:

> Jay Booth <[email protected]> writes:
>
> > I don't get what they're talking about with hiding I/O limitations..  if
> the
> > OS is doing a poor job of handling sequential readers, that's on the OS
> and
> > not Hadoop, no?  In other words, I didn't see anything specific to Hadoop
> in
> > their "multiple readers slow down sequential access" statement, it just
> may
> > or may not be true for a given I/O subsystem.  The operating system is
> still
> > getting "open file, read, read, read, close", whether you're accessing
> that
> > file locally or via a datanode.  Datanodes don't close files in between
> read
> > calls, except at block boundaries.
>
> The root cause of the problem is the way map jobs are scheduled.  Since
> the job execution overlaps, the reads from different jobs also overlap
> and hence increase seeks.  Realistically, there's not much that the OS
> can do about it.
>
> What Vladimir is talking about is reducing the seek times by essentially
> serializing the reads through a single thread per disk.  You could
> either cleverly reorganize the reads so that seek is minimized and/or
> read the entire block in one call.
>
> -rsi
>
> I think that modern kernel and elevator implementations are in a better
place to make this decision than Hadoop most of the time.  I'd be worried
about a lot of work going into an implementation that saves a little work
some of the time and loses a bunch the rest of the time.  The existing
elevator algorithms are pretty good, and they're written in
super-duper-optimized C and run in kernel mode..  kinda hard to compete
with, and even if we do, how do we know we wouldn't wind up working against
them?





> >
> > On Mon, Dec 6, 2010 at 2:39 PM, Vladimir Rodionov
> > <[email protected]>wrote:
> >
> >> Todd,
> >>
> >> There are  some curious people who had spent time (and tax payers money
> :)
> >>  and have came to  the same conclusion (as me):
> >>
> >> http://www.jeffshafer.com/publications/papers/shafer_ispass10.pdf
> >>
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: [email protected]
> >>
> >> ________________________________________
> >> From: Todd Lipcon [[email protected]]
> >> Sent: Monday, December 06, 2010 10:04 AM
> >> To: [email protected]
> >> Subject: Re: Local sockets
> >>
> >> On Mon, Dec 6, 2010 at 9:59 AM, Vladimir Rodionov
> >> <[email protected]>wrote:
> >>
> >> > Todd,
> >> >
> >> > The major hdfs problem is inefficient processing of multiple streams
> in
> >> > parallel -
> >> > multiple readers/writers per one physical drive result in significant
> >> drop
> >> > in overall
> >> > I/O throughput on Linux (tested with ext3, ext4). There should be only
> >> one
> >> > reader thread,
> >> > one writer thread per physical drive (until we get AIO support in
> Java)
> >> >
> >> > Multiple data buffer copies in pipeline do not improve situation as
> well.
> >> >
> >>
> >> In my benchmarks, the copies account for only a minor amount of the
> >> overhead. Do a benchmark of ChecksumLocalFilesystem vs
> RawLocalFilesystem
> >> and you should see the 2x difference I mentioned for data that's in
> buffer
> >> cache.
> >>
> >> As for parallel reader streams, I disagree with your assessment. After
> >> tuning readahead and with a decent elevator algorithm (anticipatory
> seems
> >> best in my benchmarks) it's better to have multiple threads reading from
> a
> >> drive compared to one, unless we had AIO. Otherwise we won't be able to
> >> have
> >> multiple outstanding requests to the block device, and the elevator will
> be
> >> powerless to do any reordering of reads.
> >>
> >>
> >> > CRC32 can be fast btw and some other hashing algos can be even faster
> >> (like
> >> > murmur2 -1.5GB per sec)
> >> >
> >>
> >> Our CRC32 implementation goes around 750MB/sec on raw data, but for
> >> whatever
> >> undiscovered reason it adds a lot more overhead when you mix it into the
> >> data pipeline. HDFS-347 has some interesting benchmarks there.
> >>
> >> -Todd
> >>
> >> >
> >> > ________________________________________
> >> > From: Todd Lipcon [[email protected]]
> >> > Sent: Saturday, December 04, 2010 3:04 PM
> >> > To: [email protected]
> >> > Subject: Re: Local sockets
> >> >
> >> > On Sat, Dec 4, 2010 at 2:57 PM, Vladimir Rodionov
> >> > <[email protected]>wrote:
> >> >
> >> > > From my own experiments performance difference is huge even on
> >> > > sequential R/W operations (up to 300%) when you do local File I/O vs
> >> HDFS
> >> > > File I/O
> >> > >
> >> > > Overhead of HDFS I/O is substantial to say the least.
> >> > >
> >> > >
> >> > Much of this is from checksumming, though - turn off checksums and you
> >> > should see about a 2x improvement at least.
> >> >
> >> > -Todd
> >> >
> >> >
> >> > > Best regards,
> >> > > Vladimir Rodionov
> >> > > Principal Platform Engineer
> >> > > Carrier IQ, www.carrieriq.com
> >> > > e-mail: [email protected]
> >> > >
> >> > > ________________________________________
> >> > > From: Todd Lipcon [[email protected]]
> >> > > Sent: Saturday, December 04, 2010 12:30 PM
> >> > > To: [email protected]
> >> > > Subject: Re: Local sockets
> >> > >
> >> > > Hi Leen,
> >> > >
> >> > > Check out HDFS-347 for more info on this. I hope to pick this back
> up
> >> in
> >> > > 2011 - in 2010 we mostly focused on stability above performance in
> >> > HBase's
> >> > > interactions with HDFS.
> >> > >
> >> > > Thanks
> >> > > -Todd
> >> > >
> >> > > On Sat, Dec 4, 2010 at 12:28 PM, Leen Toelen <[email protected]>
> wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > has anyone tested the performance impact (when there is a hdfs
> >> > > > datanode and a hbase node on the same machine) of using unix
> domain
> >> > > > sockets communication or shared memory ipc using nio? I guess this
> >> > > > should make a difference on reads?
> >> > > >
> >> > > > Regards,
> >> > > > Leen
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Todd Lipcon
> >> > > Software Engineer, Cloudera
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Todd Lipcon
> >> > Software Engineer, Cloudera
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
>

Reply via email to