Re: Performance of hbase importing

Larry Compton Thu, 15 Jan 2009 11:29:09 -0800

That explains it. Thanks!

On Thu, Jan 15, 2009 at 2:11 PM, Jean-Daniel Cryans <[email protected]>wrote:


> Larry,
>
> This feature was done for 0.19.0 for which a release candidate is on the
> way.
>
> J-D
>
> On Thu, Jan 15, 2009 at 2:03 PM, Larry Compton
> <[email protected]>wrote:
>
> > I'm interested in trying this, but I'm not seeing "setAutoFlush()" and
> > "setWriteBufferSize()" in the "HTable" API (I'm using HBase 0.18.1).
> >
> > Larry
> >
> > On Sun, Jan 11, 2009 at 5:11 PM, Ryan Rawson <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > New user of hbase here. I've been trolling about in IRC for a few days,
> > and
> > > been getting great help all around so far.
> > >
> > > The topic turns to importing data into hbase - I have largeish datasets
> I
> > > want to evaluate hbase performance on, so I've been working at
> importing
> > > said data.  I've managed to get some impressive performance speedups,
> and
> > I
> > > chronicled them here:
> > >
> > >
> > >
> >
> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
> > >
> > > To summarize:
> > > - Use the Native HBASE API in Java or Jython (or presumably any JVM
> > > language)
> > > - Disable table auto flush, set write buffer large (12M for me)
> > >
> > > At this point I can import a 18 GB, 440m row comma-seperated flat file
> in
> > > about 72 minutes using map-reduce.  This is on a 3 node cluster all
> > running
> > > hdfs,hbase,mapred with 12 map tasks (4 per).  This hardware is loaner
> DB
> > > hardware, so once I get my real cluster I'll revise/publish new data.
> > >
> > > I look forward to meeting some of you next week at the hbase meetup at
> > > powerset!
> > >
> > > -ryan
> > >
> >
>



-- 
Larry Compton
SRA International
240.373.5312 (APL)
443.742.2762 (cell)

Re: Performance of hbase importing

Reply via email to