What about the cautionary tale of xcievers?
On Aug 28, 2010 4:40 PM, "Jay Booth" <[email protected]> wrote: > You guys could look at using ExecutorService -- set up a pool with max > 1024 threads that are reused, then you're not spawning new threads for > every read. Since those network waits are probably mostly latency, > doing them in parallel could be a win that was possible from the HBase > side. You might have problems with memory churn, though, if you're > allocating 10+ buffers per read. > > On Sat, Aug 28, 2010 at 7:27 PM, Ryan Rawson <[email protected]> wrote: >> One problem of performance right now is our inability to push io down >> into the kernel. This is where async Apis help. A full read in hbase >> might require reading 10+ files before ever returning a single row. >> Doing these in parallel would be nice. Spawning 10+ threads isn't >> really a good idea. >> >> Right now hadoop scales by adding processes, we just don't have that option. >> >> On Saturday, August 28, 2010, Todd Lipcon <[email protected]> wrote: >>> Agreed, I think we'll get more bang for our buck by finishing up (reviving) >>> patches like HDFS-941 or HDFS-347. Unfortunately performance doesn't seem to >>> be the highest priority among our customers so it's tough to find much time >>> to work on these things until we really get stability up to par. >>> >>> -Todd >>> >>> On Sat, Aug 28, 2010 at 3:36 PM, Jay Booth <[email protected]> wrote: >>> >>>> I don't think async is a magic bullet for it's own sake, we've all >>>> seen those papers that show good performance from blocking >>>> implementations. Particularly, I don't think async is worth a whole >>>> lot on the client side of service, which HBase is to HDFS. >>>> >>>> What about an HDFS call for localize(Path) which attempts to replicate >>>> the blocks for a file to the local datanode (if any) in a background >>>> thread? If RegionServers called that function for their files every >>>> so often, you'd eliminate a lot of bandwidth constraints, although the >>>> latency of establishing a local socket for every read is still there. >>>> >>>> On Sat, Aug 28, 2010 at 4:42 PM, Todd Lipcon <[email protected]> wrote: >>>> > On Sat, Aug 28, 2010 at 1:38 PM, Ryan Rawson <[email protected]> wrote: >>>> > >>>> >> One thought I had was if we have the writable code, surely just >>>> >> putting a different transport around it wouldn't be THAT bad right :-) >>>> >> >>>> >> Of course writables are really tied to that DataInputStream or >>>> >> whatever, so we'd have to work on that. Benoit said something about >>>> >> writables needing to do blocking reads and that causing issues, but >>>> >> there was a netty3 thing specifically designed to handle that by >>>> >> throwing and retrying the op later when there was more data. >>>> >> >>>> >> >>>> > The data transfer protocol actually doesn't do anything with Writables - >>>> > it's all hand coded bytes going over the transport. >>>> > >>>> > I have some code floating around somewhere for translating between >>>> blocking >>>> > IO and Netty - not sure where, though :) >>>> > >>>> > -Todd >>>> > >>>> > >>>> >> On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon <[email protected]> >>>> wrote: >>>> >> > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson <[email protected]> >>>> wrote: >>>> >> > >>>> >> >> a production server should be CPU bound, with memory caching etc. >>>> Our >>>> >> >> prod systems do see a reasonable load, and jstack always shows some >>>> >> >> kind of wait generally... >>>> >> >> >>>> >> >> but we need more IO pushdown into HDFS. For example if we are >>>> loading >>>> >> >> regions, why not do N at the same time? That figure N is probably >>>> >> >> more dependent on how many disks/node you have than anything else >>>> >> >> really. >>>> >> >> >>>> >> >> For simple reads (eg: hfile) would it really be that hard to retrofit >>>> >> >> some kind of async netty based API on top of the existing DFSClient >>>> >> >> logic? >>>> >> >> >>>> >> > >>>> >> > Would probably be a duplication rather than a retrofit, but it's >>>> probably >>>> >> > doable -- the protocol is pretty simple for reads, and failure/retry >>>> is >>>> >> much >>>> >> > less complicated compared to writes (though still pretty complicated) >>>> >> > >>>> >> > >>>> >> >> >>>> >> >> -ryan >>>> >> >> >>>> >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <[email protected]> >>>> wrote: >>>> >> >> > Depending on the workload, parallelism doesn't seem to matter much. >>>> On >>>> >> my >>>> >> >> > 8-core Nehalem test cluster with 12 disks each, I'm always network >>>> >> bound >>>> >> >> far >>>> >> >> > before I'm CPU bound for most benchmarks. ie jstacks show threads >>>> >> mostly >>>> >> >> > waiting for IO to happen, not blocked on locks. >>>> >> >> > >>>> >> >> > Is that not the case for your production boxes? >>>> >> >> > >>>> >> >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <-- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >>
