Sorry - missed the user group in my previous mail. --Suraj On Sun, Mar 6, 2011 at 10:07 PM, Suraj Varma <svarma...@gmail.com> wrote:
> Very interesting. > I was just about to send an additional mail asking why HBase client also > needs the hadoop jar (thereby tying the client onto the hadoop version as > well) - but, I guess at the least the hadoop rpc is the dependency. So, now > that makes sense. > > > > One strategy is to deploy gateways on all client nodes and use localhost > > as much as possible. > > This certainly scales up the gateway nodes - but complicates the upgrades. > For instance, we will have a 100+ clients talking to the cluster and > upgrading from 0.20.x to 0.90.x would be that much harder with version > specific gateway nodes all over the place. > > > So again avro isn't going to be a magic bullet. Neither thrift. > This is interesting (disappointing?) ... isn't the plan to substitute > hadoop rpc with avro (or thrift) while still keeping all the smart logic in > the client in place? I thought avro with its cross-version capabilities > would have solved the versioning issues and allowed the backward/forward > compatibility. I mean, a "thick" client talking avro was what I had imagined > the solution to be. > > Glad to know that client compatibility is very much in the commiter's / > community's mind. > > Based on discussion below, is async-hbase a "thick" / smart client or > something less than that? > >> 2) Does asynchbase have any limitations (functionally or otherwise) > compared > >> to the native HBase client? > > Thanks again. > --Suraj > > > On Sun, Mar 6, 2011 at 9:40 PM, Ryan Rawson <ryano...@gmail.com> wrote: > >> On Sun, Mar 6, 2011 at 9:25 PM, Suraj Varma <svarma...@gmail.com> wrote: >> > Thanks all for your insights into this. >> > >> > I would agree that providing mechanisms to support no-outage upgrades >> going >> > forward would really be widely beneficial. I was looking forward to Avro >> for >> > this reason. >> > >> > Some follow up questions: >> > 1) If asynchbase client to do this (i.e. talk wire protocol and adjust >> based >> > on server versions), why not the native hbase client? Is there something >> in >> > the native client design that would make this too hard / not worth >> > emulating? >> >> Typically this has not been an issue. The particular design of the >> way that hadoop rpc (the rpc we use) makes it difficult to offer >> multiple protocol/version support. To "fix it" would more or less >> require rewriting the entire protocol stack. I'm glad we spent serious >> time making the base storage layer and query paths fast, since without >> those fundamentals a "better" RPC would be moot. From my measurements >> I dont think we are losing a lot of performance in our current RPC >> system, and unless we are very careful we'll lose a lot in a >> thrift/avro transition. >> >> >> > 2) Does asynchbase have any limitations (functionally or otherwise) >> compared >> > to the native HBase client? >> > >> > 3) If Avro were the "native" protocol that HBase & client talks through, >> > that is one thing (and that's what I'm hoping we end up with) - however, >> > isn't spinning up Avro gateways on each node (like what is currently >> > available) require folks to scale up two layers (Avro gateway layer + >> HBase >> > layer)? i.e. now we need to be worried about whether the Avro gateways >> can >> > handle the traffic, etc. >> >> The hbase client is fairly 'thick', it must intelligently route >> between different regionservers, handle errors, relook up meta data, >> use zookeeper to bootstrap, etc. This is part of making a scalable >> client though. Having the RPC serialization in thrift or avro would >> make it easier to write those kinds of clients for non-Java languages. >> The gateway approach will probably be necessary for a while alas. At >> SU I am not sure that the gateway is adding a lot of of latency to >> small queries, since average/median latency is around 1ms. One >> strategy is to deploy gateways on all client nodes and use localhost >> as much as possible. >> >> > In our application, we have Java clients talking directly to HBase. We >> > debated using Thrift or Stargate layer (even though we have a Java >> client) >> > just because of this easier upgrade-ability. But we finally decided to >> use >> > the native HBase client because we didn't want to have to scale two >> layers >> > rather than just HBase ... and Avro was on the road map. An HBase client >> > talking native Avro directly to RS (i.e. without intermediate "gateways" >> > would have worked - but that was a ways ... >> >> So again avro isn't going to be a magic bullet. Neither thrift. You >> can't just have a dumb client with little logic open up a socket and >> start talking to HBase. That isn't congruent with a scalable system >> unfortunately. You need your clients to be smart and do a bunch of >> work that otherwise would have to be done by a centralized type node >> or another middleman. Only if the client is smart can we send the >> minimal RPCs to the shortest network length. Other systems have >> servers bounce the requests to other servers but that can promote >> extra traffic at the cost of a simpler client. >> >> > I think now that we are in the .90s, an option to do no-outage upgrades >> > (from client's perspective) would be really beneficial. >> >> We'd all like this, it's formost in pretty much every committer's mind >> all the time. It's just a HUGE body of work. One that is fraught with >> perils and danger zones. For example it seemed avro would reign >> supreme, but the RPC landscape is shifting back towards thrift. >> >> > >> > Thanks, >> > --Suraj >> > >> > >> > On Sat, Mar 5, 2011 at 2:21 PM, Todd Lipcon <t...@cloudera.com> wrote: >> > >> >> On Sat, Mar 5, 2011 at 2:10 PM, Ryan Rawson <ryano...@gmail.com> >> wrote: >> >> > As for the past RPC, it's all well to complain that we didn't spend >> >> > more time making it more compatible, but in a world where evolving >> >> > features in an early platform is more important than keeping >> backwards >> >> > compatibility (how many hbase 18 jars want to talk to a modern >> >> > cluster? Like none.), I am confident we did the right choice. Moving >> >> > forward I think the goal should NOT be to maintain the current system >> >> > compatible at all costs, but to look at things like avro and thrift, >> >> > make a calculated engineering tradeoff and get ourselves on to a >> >> > extendable platform, even if there is a flag day. We aren't out of >> >> > the woods yet, but eventually we will be. >> >> >> >> Hear hear! +1! >> >> >> >> -Todd >> >> -- >> >> Todd Lipcon >> >> Software Engineer, Cloudera >> >> >> > >> > >