One problem of performance right now is our inability to push io down into the kernel. This is where async Apis help. A full read in hbase might require reading 10+ files before ever returning a single row. Doing these in parallel would be nice. Spawning 10+ threads isn't really a good idea.
Right now hadoop scales by adding processes, we just don't have that option. On Saturday, August 28, 2010, Todd Lipcon <[email protected]> wrote: > Agreed, I think we'll get more bang for our buck by finishing up (reviving) > patches like HDFS-941 or HDFS-347. Unfortunately performance doesn't seem to > be the highest priority among our customers so it's tough to find much time > to work on these things until we really get stability up to par. > > -Todd > > On Sat, Aug 28, 2010 at 3:36 PM, Jay Booth <[email protected]> wrote: > >> I don't think async is a magic bullet for it's own sake, we've all >> seen those papers that show good performance from blocking >> implementations. Particularly, I don't think async is worth a whole >> lot on the client side of service, which HBase is to HDFS. >> >> What about an HDFS call for localize(Path) which attempts to replicate >> the blocks for a file to the local datanode (if any) in a background >> thread? If RegionServers called that function for their files every >> so often, you'd eliminate a lot of bandwidth constraints, although the >> latency of establishing a local socket for every read is still there. >> >> On Sat, Aug 28, 2010 at 4:42 PM, Todd Lipcon <[email protected]> wrote: >> > On Sat, Aug 28, 2010 at 1:38 PM, Ryan Rawson <[email protected]> wrote: >> > >> >> One thought I had was if we have the writable code, surely just >> >> putting a different transport around it wouldn't be THAT bad right :-) >> >> >> >> Of course writables are really tied to that DataInputStream or >> >> whatever, so we'd have to work on that. Benoit said something about >> >> writables needing to do blocking reads and that causing issues, but >> >> there was a netty3 thing specifically designed to handle that by >> >> throwing and retrying the op later when there was more data. >> >> >> >> >> > The data transfer protocol actually doesn't do anything with Writables - >> > it's all hand coded bytes going over the transport. >> > >> > I have some code floating around somewhere for translating between >> blocking >> > IO and Netty - not sure where, though :) >> > >> > -Todd >> > >> > >> >> On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon <[email protected]> >> wrote: >> >> > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson <[email protected]> >> wrote: >> >> > >> >> >> a production server should be CPU bound, with memory caching etc. >> Our >> >> >> prod systems do see a reasonable load, and jstack always shows some >> >> >> kind of wait generally... >> >> >> >> >> >> but we need more IO pushdown into HDFS. For example if we are >> loading >> >> >> regions, why not do N at the same time? That figure N is probably >> >> >> more dependent on how many disks/node you have than anything else >> >> >> really. >> >> >> >> >> >> For simple reads (eg: hfile) would it really be that hard to retrofit >> >> >> some kind of async netty based API on top of the existing DFSClient >> >> >> logic? >> >> >> >> >> > >> >> > Would probably be a duplication rather than a retrofit, but it's >> probably >> >> > doable -- the protocol is pretty simple for reads, and failure/retry >> is >> >> much >> >> > less complicated compared to writes (though still pretty complicated) >> >> > >> >> > >> >> >> >> >> >> -ryan >> >> >> >> >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <[email protected]> >> wrote: >> >> >> > Depending on the workload, parallelism doesn't seem to matter much. >> On >> >> my >> >> >> > 8-core Nehalem test cluster with 12 disks each, I'm always network >> >> bound >> >> >> far >> >> >> > before I'm CPU bound for most benchmarks. ie jstacks show threads >> >> mostly >> >> >> > waiting for IO to happen, not blocked on locks. >> >> >> > >> >> >> > Is that not the case for your production boxes? >> >> >> > >> >> >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <-- > Todd Lipcon > Software Engineer, Cloudera >
