Re: Performance of hbase importing

Jean-Daniel Cryans Thu, 15 Jan 2009 11:11:44 -0800

Larry,

This feature was done for 0.19.0 for which a release candidate is on the
way.


J-D

On Thu, Jan 15, 2009 at 2:03 PM, Larry Compton
<[email protected]>wrote:

> I'm interested in trying this, but I'm not seeing "setAutoFlush()" and
> "setWriteBufferSize()" in the "HTable" API (I'm using HBase 0.18.1).
>
> Larry
>
> On Sun, Jan 11, 2009 at 5:11 PM, Ryan Rawson <[email protected]> wrote:
>
> > Hi all,
> >
> > New user of hbase here. I've been trolling about in IRC for a few days,
> and
> > been getting great help all around so far.
> >
> > The topic turns to importing data into hbase - I have largeish datasets I
> > want to evaluate hbase performance on, so I've been working at importing
> > said data.  I've managed to get some impressive performance speedups, and
> I
> > chronicled them here:
> >
> >
> >
> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
> >
> > To summarize:
> > - Use the Native HBASE API in Java or Jython (or presumably any JVM
> > language)
> > - Disable table auto flush, set write buffer large (12M for me)
> >
> > At this point I can import a 18 GB, 440m row comma-seperated flat file in
> > about 72 minutes using map-reduce.  This is on a 3 node cluster all
> running
> > hdfs,hbase,mapred with 12 map tasks (4 per).  This hardware is loaner DB
> > hardware, so once I get my real cluster I'll revise/publish new data.
> >
> > I look forward to meeting some of you next week at the hbase meetup at
> > powerset!
> >
> > -ryan
> >
>

Re: Performance of hbase importing

Reply via email to