Performance of hbase importing

Ryan Rawson Sun, 11 Jan 2009 14:12:14 -0800

Hi all,

New user of hbase here. I've been trolling about in IRC for a few days, and
been getting great help all around so far.


The topic turns to importing data into hbase - I have largeish datasets I
want to evaluate hbase performance on, so I've been working at importing
said data.  I've managed to get some impressive performance speedups, and I
chronicled them here:

http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html

To summarize:
- Use the Native HBASE API in Java or Jython (or presumably any JVM
language)
- Disable table auto flush, set write buffer large (12M for me)

At this point I can import a 18 GB, 440m row comma-seperated flat file in
about 72 minutes using map-reduce.  This is on a 3 node cluster all running
hdfs,hbase,mapred with 12 map tasks (4 per).  This hardware is loaner DB
hardware, so once I get my real cluster I'll revise/publish new data.

I look forward to meeting some of you next week at the hbase meetup at
powerset!

-ryan

Performance of hbase importing

Reply via email to