You guys could look at using ExecutorService -- set up a pool with max
1024 threads that are reused, then you're not spawning new threads for
every read.  Since those network waits are probably mostly latency,
doing them in parallel could be a win that was possible from the HBase
side.  You might have problems with memory churn, though, if you're
allocating 10+ buffers per read.

On Sat, Aug 28, 2010 at 7:27 PM, Ryan Rawson <[email protected]> wrote:
> One problem of performance right now is our inability to push io down
> into the kernel. This is where async Apis help. A full read in hbase
> might require reading 10+ files before ever returning a single row.
> Doing these in parallel would be nice. Spawning 10+ threads isn't
> really a good idea.
>
> Right now hadoop scales by adding processes, we just don't have that option.
>
> On Saturday, August 28, 2010, Todd Lipcon <[email protected]> wrote:
>> Agreed, I think we'll get more bang for our buck by finishing up (reviving)
>> patches like HDFS-941 or HDFS-347. Unfortunately performance doesn't seem to
>> be the highest priority among our customers so it's tough to find much time
>> to work on these things until we really get stability up to par.
>>
>> -Todd
>>
>> On Sat, Aug 28, 2010 at 3:36 PM, Jay Booth <[email protected]> wrote:
>>
>>> I don't think async is a magic bullet for it's own sake, we've all
>>> seen those papers that show good performance from blocking
>>> implementations.  Particularly, I don't think async is worth a whole
>>> lot on the client side of service, which HBase is to HDFS.
>>>
>>> What about an HDFS call for localize(Path) which attempts to replicate
>>> the blocks for a file to the local datanode (if any) in a background
>>> thread?  If RegionServers called that function for their files every
>>> so often, you'd eliminate a lot of bandwidth constraints, although the
>>> latency of establishing a local socket for every read is still there.
>>>
>>> On Sat, Aug 28, 2010 at 4:42 PM, Todd Lipcon <[email protected]> wrote:
>>> > On Sat, Aug 28, 2010 at 1:38 PM, Ryan Rawson <[email protected]> wrote:
>>> >
>>> >> One thought I had was if we have the writable code, surely just
>>> >> putting a different transport around it wouldn't be THAT bad right :-)
>>> >>
>>> >> Of course writables are really tied to that DataInputStream or
>>> >> whatever, so we'd have to work on that.  Benoit said something about
>>> >> writables needing to do blocking reads and that causing issues, but
>>> >> there was a netty3 thing specifically designed to handle that by
>>> >> throwing and retrying the op later when there was more data.
>>> >>
>>> >>
>>> > The data transfer protocol actually doesn't do anything with Writables -
>>> > it's all hand coded bytes going over the transport.
>>> >
>>> > I have some code floating around somewhere for translating between
>>> blocking
>>> > IO and Netty - not sure where, though :)
>>> >
>>> > -Todd
>>> >
>>> >
>>> >>  On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon <[email protected]>
>>> wrote:
>>> >> > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson <[email protected]>
>>> wrote:
>>> >> >
>>> >> >> a production server should be CPU bound, with memory caching etc.
>>>  Our
>>> >> >> prod systems do see a reasonable load, and jstack always shows some
>>> >> >> kind of wait generally...
>>> >> >>
>>> >> >> but we need more IO pushdown into HDFS.  For example if we are
>>> loading
>>> >> >> regions, why not do N at the same time?  That figure N is probably
>>> >> >> more dependent on how many disks/node you have than anything else
>>> >> >> really.
>>> >> >>
>>> >> >> For simple reads (eg: hfile) would it really be that hard to retrofit
>>> >> >> some kind of async netty based API on top of the existing DFSClient
>>> >> >> logic?
>>> >> >>
>>> >> >
>>> >> > Would probably be a duplication rather than a retrofit, but it's
>>> probably
>>> >> > doable -- the protocol is pretty simple for reads, and failure/retry
>>> is
>>> >> much
>>> >> > less complicated compared to writes (though still pretty complicated)
>>> >> >
>>> >> >
>>> >> >>
>>> >> >> -ryan
>>> >> >>
>>> >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <[email protected]>
>>> wrote:
>>> >> >> > Depending on the workload, parallelism doesn't seem to matter much.
>>> On
>>> >> my
>>> >> >> > 8-core Nehalem test cluster with 12 disks each, I'm always network
>>> >> bound
>>> >> >> far
>>> >> >> > before I'm CPU bound for most benchmarks. ie jstacks show threads
>>> >> mostly
>>> >> >> > waiting for IO to happen, not blocked on locks.
>>> >> >> >
>>> >> >> > Is that not the case for your production boxes?
>>> >> >> >
>>> >> >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <--
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Reply via email to