The api is htable.setAutoCommit or thereabouts. The htable normally calls to hbase rs every put() call, but if you turn off autocommit it will batch up and send a bunch in one rpc.
I cannot say for certain about the thrift batch commit, don't have the code up right now. Fairly simple codepath though. Finally, I found a simple map reduce was 3x slower when written in jython. I haven't used it since. On Sep 3, 2009 1:26 AM, "Sylvain Hellegouarch" <[email protected]> wrote: > Thrift spawns as many threads as requests, so running more than one > shouldn't benefit you much ... Being a little unaware of Java's cleverness with threads I cannot really say but you're probably right. > > I run 1 thriftserver per regionserver, co existing, and then use > TSocketPool on the client si... I'm a little confused then as what is the difference between the bulk commit you mention and batch mutations support in the thrift interface. Moreover, the Hbase 0.20 API is a bit unclear as to when the commit is done when using Put. In fact I'm a little unclear as to what is the best practice to write lots of rows so that it is as efficient as it can. One by one? Batch Mutations? > > Personally, we use thrift for php scripts, and use the Java API for > map-reduces and bulk data... We will be using Pig Latin probably for the M/R with a Java adapter to fetch rows from HBase. However we do use Python for writing and I'm willing to use Jython but that would probably create other dependencies issue that I'd be happy to avoid if Thrift is good enough :) Thanks, - Sylvain -- Sylvain Hellegouarch http://www.defuze.org
