On Sun, Mar 6, 2011 at 9:25 PM, Suraj Varma <svarma...@gmail.com> wrote: > 1) If asynchbase client to do this (i.e. talk wire protocol and adjust based > on server versions), why not the native hbase client? Is there something in > the native client design that would make this too hard / not worth > emulating?
Technically the native HBase client could do the same thing too. But due to the way it's implemented, it's not as easy to do as with asynchbase. > 2) Does asynchbase have any limitations (functionally or otherwise) compared > to the native HBase client? It doesn't support all the RPC types, more specifically it doesn't implement any administrative RPC (to create tables for instance). As far as limitations goes, that's all I can think of. It has a couple known bugs too, but they only crop up when you restart your HBase cluster and I'm working on fixing them. > 3) If Avro were the "native" protocol that HBase & client talks through, > that is one thing (and that's what I'm hoping we end up with) - however, > isn't spinning up Avro gateways on each node (like what is currently > available) require folks to scale up two layers (Avro gateway layer + HBase > layer)? i.e. now we need to be worried about whether the Avro gateways can > handle the traffic, etc. While I'm not a big fan of the gateways myself, I must say that scaling them is easy. They don't do a lot of work and don't add a lot of latency (unless the GC is playing against you). On Sun, Mar 6, 2011 at 9:40 PM, Ryan Rawson <ryano...@gmail.com> wrote: > Typically this has not been an issue. The particular design of the > way that hadoop rpc (the rpc we use) makes it difficult to offer > multiple protocol/version support. To "fix it" would more or less > require rewriting the entire protocol stack. It doesn't have to be that drastic. It's possible to make incremental improvements and some of the changes that break backwards compatibility were really just doing it "for free", as it was possible to make backwards compatibility simpler. For instance, the Get responses start with 1 byte called the GET_VERSION. This byte is still unused (it's still at version 1) despite incompatible changes made to this RPC. > The hbase client is fairly 'thick', it must intelligently route > between different regionservers, handle errors, relook up meta data, > use zookeeper to bootstrap, etc. This is part of making a scalable > client though. > [...] > So again avro isn't going to be a magic bullet. Neither thrift. You > can't just have a dumb client with little logic open up a socket and > start talking to HBase. That isn't congruent with a scalable system > unfortunately. ElasticSearch uses regular HTTP clients and still needs to intelligently route between different nodes, handle errors, lookup meta data etc. Thick and thin clients both have their own sets of advantages / disadvantages, and I'm not saying one is better than the other, but you can't really argue that a scalable system requires thick clients. On Sun, Mar 6, 2011 at 10:07 PM, Suraj Varma <svarma...@gmail.com> wrote: > Very interesting. > I was just about to send an additional mail asking why HBase client also > needs the hadoop jar (thereby tying the client onto the hadoop version as > well) - but, I guess at the least the hadoop rpc is the dependency. So, now > that makes sense. No it doesn't make sense. HBase's jar actually contains a copy-pasted-hacked version of the Hadoop RPC code. Most of the RPC stuff happens inside the HBase jar, it only uses some helper functions from the Hadoop jar. > This is interesting (disappointing?) ... isn't the plan to substitute > hadoop rpc with avro (or thrift) while still keeping all the smart logic in > the client in place? I thought avro with its cross-version capabilities > would have solved the versioning issues and allowed the backward/forward > compatibility. I mean, a "thick" client talking avro was what I had imagined > the solution to be. Yeah that's idea, change the on-wire protocol, just keep the logic in the thick client. Personally I just hope whichever RPC protocol ends up replacing the horrible Hadoop RPC in HBase will have support for asynchronous / non-blocking operations both on the client side and on the server side, so we can finally move away from the inefficient model with thread pools containing hundreds of threads. > Based on discussion below, is async-hbase a "thick" / smart client or > something less than that? asynchbase is a thick client too, it implements all the logic that is needed to use HBase. Some of the logic is actually implemented slightly differently, but as far as end-users are concerned these are implementation details (for better performance / reliability / scalability). -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com