Re: HTable thread safety in 0.20.6

tsuna Sun, 06 Mar 2011 23:05:01 -0800

On Sun, Mar 6, 2011 at 9:25 PM, Suraj Varma <svarma...@gmail.com> wrote:
> 1) If asynchbase client to do this (i.e. talk wire protocol and adjust based
> on server versions), why not the native hbase client? Is there something in
> the native client design that would make this too hard / not worth
> emulating?

Technically the native HBase client could do the same thing too.  But
due to the way it's implemented, it's not as easy to do as with
asynchbase.

> 2) Does asynchbase have any limitations (functionally or otherwise) compared
> to the native HBase client?

It doesn't support all the RPC types, more specifically it doesn't
implement any administrative RPC (to create tables for instance).  As
far as limitations goes, that's all I can think of.  It has a couple
known bugs too, but they only crop up when you restart your HBase
cluster and I'm working on fixing them.

> 3) If Avro were the "native" protocol that HBase & client talks through,
> that is one thing (and that's what I'm hoping we end up with) - however,
> isn't spinning up Avro gateways on each node (like what is currently
> available) require folks to scale up two layers (Avro gateway layer + HBase
> layer)? i.e. now we need to be worried about whether the Avro gateways can
> handle the traffic, etc.

While I'm not a big fan of the gateways myself, I must say that
scaling them is easy.  They don't do a lot of work and don't add a lot
of latency (unless the GC is playing against you).

On Sun, Mar 6, 2011 at 9:40 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> Typically this has not been an issue.  The particular design of the
> way that hadoop rpc (the rpc we use) makes it difficult to offer
> multiple protocol/version support. To "fix it" would more or less
> require rewriting the entire protocol stack.

It doesn't have to be that drastic.  It's possible to make incremental
improvements and some of the changes that break backwards
compatibility were really just doing it "for free", as it was possible
to make backwards compatibility simpler.  For instance, the Get
responses start with 1 byte called the GET_VERSION.  This byte is
still unused (it's still at version 1) despite incompatible changes
made to this RPC.

> The hbase client is fairly 'thick', it must intelligently route
> between different regionservers, handle errors, relook up meta data,
> use zookeeper to bootstrap, etc. This is part of making a scalable
> client though.
> [...]
> So again avro isn't going to be a magic bullet. Neither thrift.  You
> can't just have a dumb client with little logic open up a socket and
> start talking to HBase.  That isn't congruent with a scalable system
> unfortunately.

ElasticSearch uses regular HTTP clients and still needs to
intelligently route between different nodes, handle errors, lookup
meta data etc.  Thick and thin clients both have their own sets of
advantages / disadvantages, and I'm not saying one is better than the
other, but you can't really argue that a scalable system requires
thick clients.

On Sun, Mar 6, 2011 at 10:07 PM, Suraj Varma <svarma...@gmail.com> wrote:
> Very interesting.
> I was just about to send an additional mail asking why HBase client also
> needs the hadoop jar (thereby tying the client onto the hadoop version as
> well) - but, I guess at the least the hadoop rpc is the dependency. So, now
> that makes sense.

No it doesn't make sense.  HBase's jar actually contains a
copy-pasted-hacked version of the Hadoop RPC code.  Most of the RPC
stuff happens inside the HBase jar, it only uses some helper functions
from the Hadoop jar.

> This is interesting (disappointing?) ... isn't the plan to substitute
> hadoop rpc with avro (or thrift) while still keeping all the smart logic in
> the client in place? I thought avro with its cross-version capabilities
> would have solved the versioning issues and allowed the backward/forward
> compatibility. I mean, a "thick" client talking avro was what I had imagined
> the solution to be.

Yeah that's idea, change the on-wire protocol, just keep the logic in
the thick client.  Personally I just hope whichever RPC protocol ends
up replacing the horrible Hadoop RPC in HBase will have support for
asynchronous / non-blocking operations both on the client side and on
the server side, so we can finally move away from the inefficient
model with thread pools containing hundreds of threads.

> Based on discussion below, is async-hbase a "thick" / smart client or
> something less than that?

asynchbase is a thick client too, it implements all the logic that is
needed to use HBase.  Some of the logic is actually implemented
slightly differently, but as far as end-users are concerned these are
implementation details (for better performance / reliability /
scalability).

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: HTable thread safety in 0.20.6

Reply via email to