So are we talking about re-implementing IO scheduling in Hadoop at the application level?
On Mon, Dec 6, 2010 at 12:13 PM, Rajappa Iyer <[email protected]> wrote: > Jay Booth <[email protected]> writes: > >> I don't get what they're talking about with hiding I/O limitations.. if the >> OS is doing a poor job of handling sequential readers, that's on the OS and >> not Hadoop, no? In other words, I didn't see anything specific to Hadoop in >> their "multiple readers slow down sequential access" statement, it just may >> or may not be true for a given I/O subsystem. The operating system is still >> getting "open file, read, read, read, close", whether you're accessing that >> file locally or via a datanode. Datanodes don't close files in between read >> calls, except at block boundaries. > > The root cause of the problem is the way map jobs are scheduled. Since > the job execution overlaps, the reads from different jobs also overlap > and hence increase seeks. Realistically, there's not much that the OS > can do about it. > > What Vladimir is talking about is reducing the seek times by essentially > serializing the reads through a single thread per disk. You could > either cleverly reorganize the reads so that seek is minimized and/or > read the entire block in one call. > > -rsi > >> >> On Mon, Dec 6, 2010 at 2:39 PM, Vladimir Rodionov >> <[email protected]>wrote: >> >>> Todd, >>> >>> There are some curious people who had spent time (and tax payers money :) >>> and have came to the same conclusion (as me): >>> >>> http://www.jeffshafer.com/publications/papers/shafer_ispass10.pdf >>> >>> >>> Best regards, >>> Vladimir Rodionov >>> Principal Platform Engineer >>> Carrier IQ, www.carrieriq.com >>> e-mail: [email protected] >>> >>> ________________________________________ >>> From: Todd Lipcon [[email protected]] >>> Sent: Monday, December 06, 2010 10:04 AM >>> To: [email protected] >>> Subject: Re: Local sockets >>> >>> On Mon, Dec 6, 2010 at 9:59 AM, Vladimir Rodionov >>> <[email protected]>wrote: >>> >>> > Todd, >>> > >>> > The major hdfs problem is inefficient processing of multiple streams in >>> > parallel - >>> > multiple readers/writers per one physical drive result in significant >>> drop >>> > in overall >>> > I/O throughput on Linux (tested with ext3, ext4). There should be only >>> one >>> > reader thread, >>> > one writer thread per physical drive (until we get AIO support in Java) >>> > >>> > Multiple data buffer copies in pipeline do not improve situation as well. >>> > >>> >>> In my benchmarks, the copies account for only a minor amount of the >>> overhead. Do a benchmark of ChecksumLocalFilesystem vs RawLocalFilesystem >>> and you should see the 2x difference I mentioned for data that's in buffer >>> cache. >>> >>> As for parallel reader streams, I disagree with your assessment. After >>> tuning readahead and with a decent elevator algorithm (anticipatory seems >>> best in my benchmarks) it's better to have multiple threads reading from a >>> drive compared to one, unless we had AIO. Otherwise we won't be able to >>> have >>> multiple outstanding requests to the block device, and the elevator will be >>> powerless to do any reordering of reads. >>> >>> >>> > CRC32 can be fast btw and some other hashing algos can be even faster >>> (like >>> > murmur2 -1.5GB per sec) >>> > >>> >>> Our CRC32 implementation goes around 750MB/sec on raw data, but for >>> whatever >>> undiscovered reason it adds a lot more overhead when you mix it into the >>> data pipeline. HDFS-347 has some interesting benchmarks there. >>> >>> -Todd >>> >>> > >>> > ________________________________________ >>> > From: Todd Lipcon [[email protected]] >>> > Sent: Saturday, December 04, 2010 3:04 PM >>> > To: [email protected] >>> > Subject: Re: Local sockets >>> > >>> > On Sat, Dec 4, 2010 at 2:57 PM, Vladimir Rodionov >>> > <[email protected]>wrote: >>> > >>> > > From my own experiments performance difference is huge even on >>> > > sequential R/W operations (up to 300%) when you do local File I/O vs >>> HDFS >>> > > File I/O >>> > > >>> > > Overhead of HDFS I/O is substantial to say the least. >>> > > >>> > > >>> > Much of this is from checksumming, though - turn off checksums and you >>> > should see about a 2x improvement at least. >>> > >>> > -Todd >>> > >>> > >>> > > Best regards, >>> > > Vladimir Rodionov >>> > > Principal Platform Engineer >>> > > Carrier IQ, www.carrieriq.com >>> > > e-mail: [email protected] >>> > > >>> > > ________________________________________ >>> > > From: Todd Lipcon [[email protected]] >>> > > Sent: Saturday, December 04, 2010 12:30 PM >>> > > To: [email protected] >>> > > Subject: Re: Local sockets >>> > > >>> > > Hi Leen, >>> > > >>> > > Check out HDFS-347 for more info on this. I hope to pick this back up >>> in >>> > > 2011 - in 2010 we mostly focused on stability above performance in >>> > HBase's >>> > > interactions with HDFS. >>> > > >>> > > Thanks >>> > > -Todd >>> > > >>> > > On Sat, Dec 4, 2010 at 12:28 PM, Leen Toelen <[email protected]> wrote: >>> > > >>> > > > Hi, >>> > > > >>> > > > has anyone tested the performance impact (when there is a hdfs >>> > > > datanode and a hbase node on the same machine) of using unix domain >>> > > > sockets communication or shared memory ipc using nio? I guess this >>> > > > should make a difference on reads? >>> > > > >>> > > > Regards, >>> > > > Leen >>> > > > >>> > > >>> > > >>> > > >>> > > -- >>> > > Todd Lipcon >>> > > Software Engineer, Cloudera >>> > > >>> > >>> > >>> > >>> > -- >>> > Todd Lipcon >>> > Software Engineer, Cloudera >>> > >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >
