On Mon, Dec 6, 2010 at 3:13 PM, Rajappa Iyer <[email protected]> wrote: > Jay Booth <[email protected]> writes: > > > I don't get what they're talking about with hiding I/O limitations.. if > the > > OS is doing a poor job of handling sequential readers, that's on the OS > and > > not Hadoop, no? In other words, I didn't see anything specific to Hadoop > in > > their "multiple readers slow down sequential access" statement, it just > may > > or may not be true for a given I/O subsystem. The operating system is > still > > getting "open file, read, read, read, close", whether you're accessing > that > > file locally or via a datanode. Datanodes don't close files in between > read > > calls, except at block boundaries. > > The root cause of the problem is the way map jobs are scheduled. Since > the job execution overlaps, the reads from different jobs also overlap > and hence increase seeks. Realistically, there's not much that the OS > can do about it. > > What Vladimir is talking about is reducing the seek times by essentially > serializing the reads through a single thread per disk. You could > either cleverly reorganize the reads so that seek is minimized and/or > read the entire block in one call. > > -rsi > > I think that modern kernel and elevator implementations are in a better place to make this decision than Hadoop most of the time. I'd be worried about a lot of work going into an implementation that saves a little work some of the time and loses a bunch the rest of the time. The existing elevator algorithms are pretty good, and they're written in super-duper-optimized C and run in kernel mode.. kinda hard to compete with, and even if we do, how do we know we wouldn't wind up working against them?
> > > > On Mon, Dec 6, 2010 at 2:39 PM, Vladimir Rodionov > > <[email protected]>wrote: > > > >> Todd, > >> > >> There are some curious people who had spent time (and tax payers money > :) > >> and have came to the same conclusion (as me): > >> > >> http://www.jeffshafer.com/publications/papers/shafer_ispass10.pdf > >> > >> > >> Best regards, > >> Vladimir Rodionov > >> Principal Platform Engineer > >> Carrier IQ, www.carrieriq.com > >> e-mail: [email protected] > >> > >> ________________________________________ > >> From: Todd Lipcon [[email protected]] > >> Sent: Monday, December 06, 2010 10:04 AM > >> To: [email protected] > >> Subject: Re: Local sockets > >> > >> On Mon, Dec 6, 2010 at 9:59 AM, Vladimir Rodionov > >> <[email protected]>wrote: > >> > >> > Todd, > >> > > >> > The major hdfs problem is inefficient processing of multiple streams > in > >> > parallel - > >> > multiple readers/writers per one physical drive result in significant > >> drop > >> > in overall > >> > I/O throughput on Linux (tested with ext3, ext4). There should be only > >> one > >> > reader thread, > >> > one writer thread per physical drive (until we get AIO support in > Java) > >> > > >> > Multiple data buffer copies in pipeline do not improve situation as > well. > >> > > >> > >> In my benchmarks, the copies account for only a minor amount of the > >> overhead. Do a benchmark of ChecksumLocalFilesystem vs > RawLocalFilesystem > >> and you should see the 2x difference I mentioned for data that's in > buffer > >> cache. > >> > >> As for parallel reader streams, I disagree with your assessment. After > >> tuning readahead and with a decent elevator algorithm (anticipatory > seems > >> best in my benchmarks) it's better to have multiple threads reading from > a > >> drive compared to one, unless we had AIO. Otherwise we won't be able to > >> have > >> multiple outstanding requests to the block device, and the elevator will > be > >> powerless to do any reordering of reads. > >> > >> > >> > CRC32 can be fast btw and some other hashing algos can be even faster > >> (like > >> > murmur2 -1.5GB per sec) > >> > > >> > >> Our CRC32 implementation goes around 750MB/sec on raw data, but for > >> whatever > >> undiscovered reason it adds a lot more overhead when you mix it into the > >> data pipeline. HDFS-347 has some interesting benchmarks there. > >> > >> -Todd > >> > >> > > >> > ________________________________________ > >> > From: Todd Lipcon [[email protected]] > >> > Sent: Saturday, December 04, 2010 3:04 PM > >> > To: [email protected] > >> > Subject: Re: Local sockets > >> > > >> > On Sat, Dec 4, 2010 at 2:57 PM, Vladimir Rodionov > >> > <[email protected]>wrote: > >> > > >> > > From my own experiments performance difference is huge even on > >> > > sequential R/W operations (up to 300%) when you do local File I/O vs > >> HDFS > >> > > File I/O > >> > > > >> > > Overhead of HDFS I/O is substantial to say the least. > >> > > > >> > > > >> > Much of this is from checksumming, though - turn off checksums and you > >> > should see about a 2x improvement at least. > >> > > >> > -Todd > >> > > >> > > >> > > Best regards, > >> > > Vladimir Rodionov > >> > > Principal Platform Engineer > >> > > Carrier IQ, www.carrieriq.com > >> > > e-mail: [email protected] > >> > > > >> > > ________________________________________ > >> > > From: Todd Lipcon [[email protected]] > >> > > Sent: Saturday, December 04, 2010 12:30 PM > >> > > To: [email protected] > >> > > Subject: Re: Local sockets > >> > > > >> > > Hi Leen, > >> > > > >> > > Check out HDFS-347 for more info on this. I hope to pick this back > up > >> in > >> > > 2011 - in 2010 we mostly focused on stability above performance in > >> > HBase's > >> > > interactions with HDFS. > >> > > > >> > > Thanks > >> > > -Todd > >> > > > >> > > On Sat, Dec 4, 2010 at 12:28 PM, Leen Toelen <[email protected]> > wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > has anyone tested the performance impact (when there is a hdfs > >> > > > datanode and a hbase node on the same machine) of using unix > domain > >> > > > sockets communication or shared memory ipc using nio? I guess this > >> > > > should make a difference on reads? > >> > > > > >> > > > Regards, > >> > > > Leen > >> > > > > >> > > > >> > > > >> > > > >> > > -- > >> > > Todd Lipcon > >> > > Software Engineer, Cloudera > >> > > > >> > > >> > > >> > > >> > -- > >> > Todd Lipcon > >> > Software Engineer, Cloudera > >> > > >> > >> > >> > >> -- > >> Todd Lipcon > >> Software Engineer, Cloudera > >> >
