I don't think async is a magic bullet for it's own sake, we've all seen those papers that show good performance from blocking implementations. Particularly, I don't think async is worth a whole lot on the client side of service, which HBase is to HDFS.
What about an HDFS call for localize(Path) which attempts to replicate the blocks for a file to the local datanode (if any) in a background thread? If RegionServers called that function for their files every so often, you'd eliminate a lot of bandwidth constraints, although the latency of establishing a local socket for every read is still there. On Sat, Aug 28, 2010 at 4:42 PM, Todd Lipcon <[email protected]> wrote: > On Sat, Aug 28, 2010 at 1:38 PM, Ryan Rawson <[email protected]> wrote: > >> One thought I had was if we have the writable code, surely just >> putting a different transport around it wouldn't be THAT bad right :-) >> >> Of course writables are really tied to that DataInputStream or >> whatever, so we'd have to work on that. Benoit said something about >> writables needing to do blocking reads and that causing issues, but >> there was a netty3 thing specifically designed to handle that by >> throwing and retrying the op later when there was more data. >> >> > The data transfer protocol actually doesn't do anything with Writables - > it's all hand coded bytes going over the transport. > > I have some code floating around somewhere for translating between blocking > IO and Netty - not sure where, though :) > > -Todd > > >> On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon <[email protected]> wrote: >> > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson <[email protected]> wrote: >> > >> >> a production server should be CPU bound, with memory caching etc. Our >> >> prod systems do see a reasonable load, and jstack always shows some >> >> kind of wait generally... >> >> >> >> but we need more IO pushdown into HDFS. For example if we are loading >> >> regions, why not do N at the same time? That figure N is probably >> >> more dependent on how many disks/node you have than anything else >> >> really. >> >> >> >> For simple reads (eg: hfile) would it really be that hard to retrofit >> >> some kind of async netty based API on top of the existing DFSClient >> >> logic? >> >> >> > >> > Would probably be a duplication rather than a retrofit, but it's probably >> > doable -- the protocol is pretty simple for reads, and failure/retry is >> much >> > less complicated compared to writes (though still pretty complicated) >> > >> > >> >> >> >> -ryan >> >> >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <[email protected]> wrote: >> >> > Depending on the workload, parallelism doesn't seem to matter much. On >> my >> >> > 8-core Nehalem test cluster with 12 disks each, I'm always network >> bound >> >> far >> >> > before I'm CPU bound for most benchmarks. ie jstacks show threads >> mostly >> >> > waiting for IO to happen, not blocked on locks. >> >> > >> >> > Is that not the case for your production boxes? >> >> > >> >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <[email protected]> >> wrote: >> >> > >> >> >> bigtable was written for 1 core machines, with ~ 100 regions per box. >> >> >> Thanks to CMS we generally can't run on < 4 cores, and at this point >> >> >> 16 core machines (with HTT) is becoming pretty standard. >> >> >> >> >> >> The question is, how do we leverage the ever-increasing sizes of >> >> >> machines and differentiate ourselves from bigtable? What did google >> >> >> do (if anything) to adopt to the 16 core machines? We should be able >> >> >> to do quite a bit on a 20 or 40 node cluster. >> >> >> >> >> >> more thread parallelism? >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Todd Lipcon >> >> > Software Engineer, Cloudera >> >> > >> >> >> > >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
