On Thu, Jan 15, 2009 at 2:08 AM, Wesley Chow <[email protected]> wrote: > > The HBase meetup was great -- thanks for putting together the Skype chat for > those of us in the rest of the world. > > There was some talk about a C API via Thrift. The PyLucene folk have a code > generator for using C++ and Python with JNI: > http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README. It seems to me > that this might be a reasonable route as well, though I have no clue how > active or stable that code is. > > But, a couple of questions: > > Does anybody care if there's a C++ API, but no C API? > > Is HBase RPC better than Thrift? If so, can Thrift really beat JNI? If not, > prefer PyLucene's JCC over Thrift. > > If HBase RPC is worse than Thrift, then adopting Thrift and dropping RPC > seems smart to me. You save on the messaging layer work, plus you get all > those other language bindings for free. > Thrift is certainly very useful. I have just release a HBase ORM like interface called OHM (http://belowdeck.kissintelligentsystems.com/ohm/). This is designed to be cross platform, the Thrift API's are essential to us as most of our project is written in C#. OHM has a compiler which generates the interface code for each language. If we didn't have the thrift API's it would be difficult to interface languages like .Net, & Perl.
I may be wrong but doesn't the current thrift api implementation just provide an interface to the existing Java client. I asked the question about the typical production use case for thrift api and was told that you have the thrift server running on each client (web server), so thrift only uses a local connection and that the Java client then talks to the cluster using the Hadoop RPC. You could I am sure replace the Hadoop RPC with a thrift based one, but wouldn't you need all the client logic to be reimplemented to take advantage of that language nutrality. Would this not lead to spliting lots of development effort trying to keep all the clients feature complete and compatible. May be I am over estimating the amount of work in porting the Client but it would certainly cut out one RPC layer. I don't know much about the Hadoop RPC, but I known Thirft is designed to be very efficient in terms of bytes sent down the wire, light years head of XML based RPC's. I have looked at some of the implementation details of Google's own RPC tool and its very impressive what lengths they go to, get this efficiency. They even have a more efficient way of encoding integers on to the wire for most use cases than just dumping the bits from memory. Of course some of these trade CPU time in exchange for that efficiency. In a cluster environment where everything is on Gigabit links, I am not so sure if its such a good idea, I suppose only benchmarking would tell. Charlie M
