Also I should mention for the sake of clarity that that raw 70TB capacity does not factor in 3x DFS replication, and we're putting a lot more than just HBase tables into DFS, but still we'd like our HBase tables to grow very very large with Web content and other things.
- Andy --- On Thu, 10/2/08, Andrew Purtell <[EMAIL PROTECTED]> wrote: > From: Andrew Purtell <[EMAIL PROTECTED]> > Subject: Re: success story > To: [email protected] > Date: Thursday, October 2, 2008, 5:23 PM > Yes, typo, sorry. 512MB. > > Our node specification is approximately: > CPU: 2x 4-core Xeons @ 3GHz > RAM: 8GB > Disk: 1TB RAID-1 system volume, 4 1TB RAID-0 data volumes > (for DFS) > > I'm experimenting with mapfile size limits. We started > low to get lots of splits early. I've increased it to > 512MB most recently to slow splitting. We're above the > concurrent map capacity of the cluster already. I may try to > push the split threshold up to 1GB, but of course I have > concerns about that. The goal is to make effective use of > the ~70TB capacity of the cluster without blowing up the > region count to the point where there aren't enough > region servers to effectively carry it. > > - Andy > > --- On Thu, 10/2/08, Jean-Daniel Cryans > <[EMAIL PROTECTED]> wrote: > > > From: Jean-Daniel Cryans <[EMAIL PROTECTED]> > > Subject: Re: success story > > To: [email protected] > > Date: Thursday, October 2, 2008, 4:47 PM > > Andrew, > > > > This is great! > > > > Is it a typo or you really have some regions as big as > > 250GB? > > > > What kind of machines do you use? > > > > Thx, > > > > J-D > > > > On Thu, Oct 2, 2008 at 7:11 PM, Andrew Purtell > > <[EMAIL PROTECTED]> wrote: > > > > > I just wanted to take this opportunity to report > an > > HBase success story. > > > > > > We are running Hadoop 0.18.1 and HBase 0.18.0. > > > > > > Our application is a web crawling application > with concurrent > > > batch content analysis of various kinds. All of > the workflow > > > components are implemented as subclasses of > TableMap and/or > > > TableReduce. (So yes there will be some minor > refactoring > > > necessary for 0.19...) > > > > > > We are now at ~300 regions, most of them 512GB, > hosted on a > > > cluster of 25 nodes. We see a constant rate of > 2500 > > > requests/sec or greater, peaking periodically > near 100K/sec > > > when some of the batch scan tasks run. Since > going into > > > semi-production over last weekend there has been > no downtime or > > > service faults. > > > > > > Feel free to add "Trend Micro Advanced > Threats > > Research" to the powered by > > > page. > > > > > > - Andy > > > > > > > > > > > > > > >
