Re: success story

Andrew Purtell Thu, 02 Oct 2008 17:28:58 -0700

Also I should mention for the sake of clarity that that raw 70TB capacity does 
not factor in 3x DFS replication, and we're putting a lot more than just HBase 
tables into DFS, but still we'd like our HBase tables to grow very very large 
with Web content and other things.


  - Andy


--- On Thu, 10/2/08, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> From: Andrew Purtell <[EMAIL PROTECTED]>
> Subject: Re: success story
> To: [email protected]
> Date: Thursday, October 2, 2008, 5:23 PM
> Yes, typo, sorry. 512MB. 
> 
> Our node specification is approximately:
>   CPU: 2x 4-core Xeons @ 3GHz
>   RAM: 8GB
>   Disk: 1TB RAID-1 system volume, 4 1TB RAID-0 data volumes
> (for DFS)
> 
> I'm experimenting with mapfile size limits. We started
> low to get lots of splits early. I've increased it to
> 512MB most recently to slow splitting. We're above the
> concurrent map capacity of the cluster already. I may try to
> push the split threshold up to 1GB, but of course I have
> concerns about that. The goal is to make effective use of
> the ~70TB capacity of the cluster without blowing up the
> region count to the point where there aren't enough
> region servers to effectively carry it. 
> 
>    - Andy
> 
> --- On Thu, 10/2/08, Jean-Daniel Cryans
> <[EMAIL PROTECTED]> wrote:
> 
> > From: Jean-Daniel Cryans <[EMAIL PROTECTED]>
> > Subject: Re: success story
> > To: [email protected]
> > Date: Thursday, October 2, 2008, 4:47 PM
> > Andrew,
> > 
> > This is great!
> > 
> > Is it a typo or you really have some regions as big as
> > 250GB?
> > 
> > What kind of machines do you use?
> > 
> > Thx,
> > 
> > J-D
> > 
> > On Thu, Oct 2, 2008 at 7:11 PM, Andrew Purtell
> > <[EMAIL PROTECTED]> wrote:
> > 
> > > I just wanted to take this opportunity to report
> an
> > HBase success story.
> > >
> > > We are running Hadoop 0.18.1 and HBase 0.18.0.
> > >
> > > Our application is a web crawling application
> with concurrent
> > > batch content analysis of various kinds. All of
> the workflow
> > > components are implemented as subclasses of
> TableMap and/or
> > > TableReduce. (So yes there will be some minor
> refactoring
> > > necessary for 0.19...)
> > >
> > > We are now at ~300 regions, most of them 512GB,
> hosted on a
> > > cluster of 25 nodes. We see a constant rate of
> 2500
> > > requests/sec or greater, peaking periodically
> near 100K/sec
> > > when some of the batch scan tasks run. Since
> going into
> > > semi-production over last weekend there has been
> no downtime or
> > > service faults.
> > >
> > > Feel free to add "Trend Micro Advanced
> Threats
> > Research" to the powered by
> > > page.
> > >
> > >   - Andy
> > >
> > >
> > >
> > >
> > >

Re: success story

Reply via email to