Jay Booth <[email protected]> writes: > I don't get what they're talking about with hiding I/O limitations.. if the > OS is doing a poor job of handling sequential readers, that's on the OS and > not Hadoop, no? In other words, I didn't see anything specific to Hadoop in > their "multiple readers slow down sequential access" statement, it just may > or may not be true for a given I/O subsystem. The operating system is still > getting "open file, read, read, read, close", whether you're accessing that > file locally or via a datanode. Datanodes don't close files in between read > calls, except at block boundaries.
The root cause of the problem is the way map jobs are scheduled. Since the job execution overlaps, the reads from different jobs also overlap and hence increase seeks. Realistically, there's not much that the OS can do about it. What Vladimir is talking about is reducing the seek times by essentially serializing the reads through a single thread per disk. You could either cleverly reorganize the reads so that seek is minimized and/or read the entire block in one call. -rsi > > On Mon, Dec 6, 2010 at 2:39 PM, Vladimir Rodionov > <[email protected]>wrote: > >> Todd, >> >> There are some curious people who had spent time (and tax payers money :) >> and have came to the same conclusion (as me): >> >> http://www.jeffshafer.com/publications/papers/shafer_ispass10.pdf >> >> >> Best regards, >> Vladimir Rodionov >> Principal Platform Engineer >> Carrier IQ, www.carrieriq.com >> e-mail: [email protected] >> >> ________________________________________ >> From: Todd Lipcon [[email protected]] >> Sent: Monday, December 06, 2010 10:04 AM >> To: [email protected] >> Subject: Re: Local sockets >> >> On Mon, Dec 6, 2010 at 9:59 AM, Vladimir Rodionov >> <[email protected]>wrote: >> >> > Todd, >> > >> > The major hdfs problem is inefficient processing of multiple streams in >> > parallel - >> > multiple readers/writers per one physical drive result in significant >> drop >> > in overall >> > I/O throughput on Linux (tested with ext3, ext4). There should be only >> one >> > reader thread, >> > one writer thread per physical drive (until we get AIO support in Java) >> > >> > Multiple data buffer copies in pipeline do not improve situation as well. >> > >> >> In my benchmarks, the copies account for only a minor amount of the >> overhead. Do a benchmark of ChecksumLocalFilesystem vs RawLocalFilesystem >> and you should see the 2x difference I mentioned for data that's in buffer >> cache. >> >> As for parallel reader streams, I disagree with your assessment. After >> tuning readahead and with a decent elevator algorithm (anticipatory seems >> best in my benchmarks) it's better to have multiple threads reading from a >> drive compared to one, unless we had AIO. Otherwise we won't be able to >> have >> multiple outstanding requests to the block device, and the elevator will be >> powerless to do any reordering of reads. >> >> >> > CRC32 can be fast btw and some other hashing algos can be even faster >> (like >> > murmur2 -1.5GB per sec) >> > >> >> Our CRC32 implementation goes around 750MB/sec on raw data, but for >> whatever >> undiscovered reason it adds a lot more overhead when you mix it into the >> data pipeline. HDFS-347 has some interesting benchmarks there. >> >> -Todd >> >> > >> > ________________________________________ >> > From: Todd Lipcon [[email protected]] >> > Sent: Saturday, December 04, 2010 3:04 PM >> > To: [email protected] >> > Subject: Re: Local sockets >> > >> > On Sat, Dec 4, 2010 at 2:57 PM, Vladimir Rodionov >> > <[email protected]>wrote: >> > >> > > From my own experiments performance difference is huge even on >> > > sequential R/W operations (up to 300%) when you do local File I/O vs >> HDFS >> > > File I/O >> > > >> > > Overhead of HDFS I/O is substantial to say the least. >> > > >> > > >> > Much of this is from checksumming, though - turn off checksums and you >> > should see about a 2x improvement at least. >> > >> > -Todd >> > >> > >> > > Best regards, >> > > Vladimir Rodionov >> > > Principal Platform Engineer >> > > Carrier IQ, www.carrieriq.com >> > > e-mail: [email protected] >> > > >> > > ________________________________________ >> > > From: Todd Lipcon [[email protected]] >> > > Sent: Saturday, December 04, 2010 12:30 PM >> > > To: [email protected] >> > > Subject: Re: Local sockets >> > > >> > > Hi Leen, >> > > >> > > Check out HDFS-347 for more info on this. I hope to pick this back up >> in >> > > 2011 - in 2010 we mostly focused on stability above performance in >> > HBase's >> > > interactions with HDFS. >> > > >> > > Thanks >> > > -Todd >> > > >> > > On Sat, Dec 4, 2010 at 12:28 PM, Leen Toelen <[email protected]> wrote: >> > > >> > > > Hi, >> > > > >> > > > has anyone tested the performance impact (when there is a hdfs >> > > > datanode and a hbase node on the same machine) of using unix domain >> > > > sockets communication or shared memory ipc using nio? I guess this >> > > > should make a difference on reads? >> > > > >> > > > Regards, >> > > > Leen >> > > > >> > > >> > > >> > > >> > > -- >> > > Todd Lipcon >> > > Software Engineer, Cloudera >> > > >> > >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >>
