On Thu, Feb 16, 2012 at 3:55 PM, Jeff Whiting <[email protected]> wrote: > It seems like the only heavy part of the client would be the zookeeper > interactions (forgive my ignorance if I'm wrong).
ZooKeeper interactions are extremely simple for a client, that's not where the heavy part is. All a client needs to do with ZooKeeper is to find where the -ROOT- region is, period. In the client I wrote, asynchbase, I don't even maintain an open connection to ZooKeeper, because 99.99% of the time it's unnecessary. > Other than zookeeper only > a basic understanding of regions need to be understood. So if the zookeeper > interactions could be removed and pushed somewhere else in the stack that > could make the client much thinner. Using line count (per "wc -l") as a rough approximation of code complexity, here's a break down of asynchbase. For a total of 11k lines the big chunks of code are: ZooKeeper code: 360 lines (not actually big but I included it for comparison) Code for handling NoSuchRegionException: 500 lines Helper code to deal with byte arrays: 500 lines Helper code to deal with HBase RPC serialization: 700 lines Code to batch RPCs: 800 lines Low-level socket code, and wire serialization/deserialization: 800 lines Code to open, manage, close scanners: 1000 lines Code for looking up and caching regions: 1000 lines > hopefully never again. IMHO since you are redoing the communication why not > improve the protocol to allow for a leaner the client. A leaner client > would be more likely to work across major hbase changes, would be easier to > maintain, would hide implementation details and could have less > dependencies. Yes a leaner client would be better. But the reason the client is fat is because Bigtable's design pushed a lot of logic down to the clients in order to be able to make RPC routing decisions there, and relieve the tablet servers from having to do it. When you start to have tens of thousands of clients talking to a cluster, like Google does, it makes sense to push this work down to the many clients, rather than have the fewer TabletServers do it and re-route packets (adding extra hops etc). The overall system is more efficient this way. Leaner clients are better, but unfortunately lean clients are often dumb, so it's hard to find a good tradeoff between simplicity and efficiency. > One of the reasons the client doesn't do well across major > changes is because of how heavy it is. Even if the client is never > implemented in another language a thinner client would seem to be an > improvement. Having maintained an HBase client written from scratch for about 2 years now, I can tell you that the only things I had to fix across HBase release were wire-level serialization breakages. The heavy logic of the client has remained mostly unchanged since the days of HBase 0.20. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
