I'm interested in trying this, but I'm not seeing "setAutoFlush()" and "setWriteBufferSize()" in the "HTable" API (I'm using HBase 0.18.1).
Larry On Sun, Jan 11, 2009 at 5:11 PM, Ryan Rawson <[email protected]> wrote: > Hi all, > > New user of hbase here. I've been trolling about in IRC for a few days, and > been getting great help all around so far. > > The topic turns to importing data into hbase - I have largeish datasets I > want to evaluate hbase performance on, so I've been working at importing > said data. I've managed to get some impressive performance speedups, and I > chronicled them here: > > > http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html > > To summarize: > - Use the Native HBASE API in Java or Jython (or presumably any JVM > language) > - Disable table auto flush, set write buffer large (12M for me) > > At this point I can import a 18 GB, 440m row comma-seperated flat file in > about 72 minutes using map-reduce. This is on a 3 node cluster all running > hdfs,hbase,mapred with 12 map tasks (4 per). This hardware is loaner DB > hardware, so once I get my real cluster I'll revise/publish new data. > > I look forward to meeting some of you next week at the hbase meetup at > powerset! > > -ryan >
