TableOutputFormat also does this... table.setAutoFlush(false);
Check out the HBase book for how the writebuffer works with the HBase client. http://hbase.apache.org/book.html#client -----Original Message----- From: edward choi [mailto:mp2...@gmail.com] Sent: Tuesday, June 21, 2011 10:23 PM To: common-u...@hadoop.apache.org; user@hbase.apache.org Subject: TableOutputFormat not efficient than direct HBase API calls? Hi, I am writing an Hadoop application that uses HBase as both source and sink. There is no reducer job in my application. I am using TableOutputFormat as the OutputFormatClass. I read it on the Internet that it is experimentally faster to directly instantiate HTable and use HTable.batch() in the Map than to use TableOutputFormat as the Map's OutputClass So I looked into the source code, org.apache.hadoop.hbase.mapreduce.TableOutputFormat. It looked like TableRecordWriter does not support batch updates, since TableRecordWriter.write() called HTable.put(new Put()). Am I right on this matter? Or does TableOutputFormat automatically do batch updates somehow? Or is there a specific way to do batch updates with TableOutputFormat? Any explanation is greatly appreciated. Ed