Re: hbase vs bigtable

Ryan Rawson Sat, 28 Aug 2010 17:22:51 -0700

What about the cautionary tale of xcievers?


On Aug 28, 2010 4:40 PM, "Jay Booth" <[email protected]> wrote:
> You guys could look at using ExecutorService -- set up a pool with max
> 1024 threads that are reused, then you're not spawning new threads for
> every read. Since those network waits are probably mostly latency,
> doing them in parallel could be a win that was possible from the HBase
> side. You might have problems with memory churn, though, if you're
> allocating 10+ buffers per read.
>
> On Sat, Aug 28, 2010 at 7:27 PM, Ryan Rawson <[email protected]> wrote:
>> One problem of performance right now is our inability to push io down
>> into the kernel. This is where async Apis help. A full read in hbase
>> might require reading 10+ files before ever returning a single row.
>> Doing these in parallel would be nice. Spawning 10+ threads isn't
>> really a good idea.
>>
>> Right now hadoop scales by adding processes, we just don't have that
option.
>>
>> On Saturday, August 28, 2010, Todd Lipcon <[email protected]> wrote:
>>> Agreed, I think we'll get more bang for our buck by finishing up
(reviving)
>>> patches like HDFS-941 or HDFS-347. Unfortunately performance doesn't
seem to
>>> be the highest priority among our customers so it's tough to find much
time
>>> to work on these things until we really get stability up to par.
>>>
>>> -Todd
>>>
>>> On Sat, Aug 28, 2010 at 3:36 PM, Jay Booth <[email protected]> wrote:
>>>
>>>> I don't think async is a magic bullet for it's own sake, we've all
>>>> seen those papers that show good performance from blocking
>>>> implementations.  Particularly, I don't think async is worth a whole
>>>> lot on the client side of service, which HBase is to HDFS.
>>>>
>>>> What about an HDFS call for localize(Path) which attempts to replicate
>>>> the blocks for a file to the local datanode (if any) in a background
>>>> thread?  If RegionServers called that function for their files every
>>>> so often, you'd eliminate a lot of bandwidth constraints, although the
>>>> latency of establishing a local socket for every read is still there.
>>>>
>>>> On Sat, Aug 28, 2010 at 4:42 PM, Todd Lipcon <[email protected]> wrote:
>>>> > On Sat, Aug 28, 2010 at 1:38 PM, Ryan Rawson <[email protected]>
wrote:
>>>> >
>>>> >> One thought I had was if we have the writable code, surely just
>>>> >> putting a different transport around it wouldn't be THAT bad right
:-)
>>>> >>
>>>> >> Of course writables are really tied to that DataInputStream or
>>>> >> whatever, so we'd have to work on that.  Benoit said something about
>>>> >> writables needing to do blocking reads and that causing issues, but
>>>> >> there was a netty3 thing specifically designed to handle that by
>>>> >> throwing and retrying the op later when there was more data.
>>>> >>
>>>> >>
>>>> > The data transfer protocol actually doesn't do anything with
Writables -
>>>> > it's all hand coded bytes going over the transport.
>>>> >
>>>> > I have some code floating around somewhere for translating between
>>>> blocking
>>>> > IO and Netty - not sure where, though :)
>>>> >
>>>> > -Todd
>>>> >
>>>> >
>>>> >>  On Sat, Aug 28, 2010 at 1:32 PM, Todd Lipcon <[email protected]>
>>>> wrote:
>>>> >> > On Sat, Aug 28, 2010 at 1:29 PM, Ryan Rawson <[email protected]>
>>>> wrote:
>>>> >> >
>>>> >> >> a production server should be CPU bound, with memory caching etc.
>>>>  Our
>>>> >> >> prod systems do see a reasonable load, and jstack always shows
some
>>>> >> >> kind of wait generally...
>>>> >> >>
>>>> >> >> but we need more IO pushdown into HDFS.  For example if we are
>>>> loading
>>>> >> >> regions, why not do N at the same time?  That figure N is
probably
>>>> >> >> more dependent on how many disks/node you have than anything else
>>>> >> >> really.
>>>> >> >>
>>>> >> >> For simple reads (eg: hfile) would it really be that hard to
retrofit
>>>> >> >> some kind of async netty based API on top of the existing
DFSClient
>>>> >> >> logic?
>>>> >> >>
>>>> >> >
>>>> >> > Would probably be a duplication rather than a retrofit, but it's
>>>> probably
>>>> >> > doable -- the protocol is pretty simple for reads, and
failure/retry
>>>> is
>>>> >> much
>>>> >> > less complicated compared to writes (though still pretty
complicated)
>>>> >> >
>>>> >> >
>>>> >> >>
>>>> >> >> -ryan
>>>> >> >>
>>>> >> >> On Sat, Aug 28, 2010 at 1:11 PM, Todd Lipcon <[email protected]>
>>>> wrote:
>>>> >> >> > Depending on the workload, parallelism doesn't seem to matter
much.
>>>> On
>>>> >> my
>>>> >> >> > 8-core Nehalem test cluster with 12 disks each, I'm always
network
>>>> >> bound
>>>> >> >> far
>>>> >> >> > before I'm CPU bound for most benchmarks. ie jstacks show
threads
>>>> >> mostly
>>>> >> >> > waiting for IO to happen, not blocked on locks.
>>>> >> >> >
>>>> >> >> > Is that not the case for your production boxes?
>>>> >> >> >
>>>> >> >> > On Sat, Aug 28, 2010 at 1:07 PM, Ryan Rawson <--
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>

Re: hbase vs bigtable

Reply via email to