> Too many regions kill HBase.

How many regions do you carry per RS? What was the effective limit you 
encountered? Curious.

The available public information is getting old now but BigTable deployments at 
Google limited the number of tablets per tablet server to ~100. This was for a 
number of reasons related to their specific hardware configuration, no doubt, 
considerations such as having enough RAM to keep in memory tables in memory, 
and the fact they had something like 160 or 320 GB of local storage only, and 
so on; but also presumably to limit the scope of failure of a given server, and 
to keep overheads down.

I advise our ops people to set notifications for when the number of regions per 
HBase RegionServer gets above 500. The more regions per server, the more must 
be relocated per server failure, the longer some regions will be in transition. 
When we get close to the limit, it's time to add another RegionServer. (Even if 
HBase could handle 10,000 regions per RegionServer that wouldn't be a good idea 
without a distributed master of some kind.) If you are scaling out for this 
reason already, then the region carrying capacity of the cluster is also 
scaling. We have many thousands of regions and region housekeeping overhead is 
not an issue, although we are certainly not the largest deployment. Currently 
the META region isn't split, I think that might impose an effective upper bound 
at some point, but that can be fixed. There's no architectural limit that I am 
aware of.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)



----- Original Message -----
> From: Vladimir Rodionov <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc: 
> Sent: Wednesday, February 15, 2012 4:11 PM
> Subject: RE: Scan performance on a big table as combination of multiple logic 
> tables
> 
> 10 tables are fine. 1000 are not, especially when one does table 
> pre-splitting 
> to increase write perf.
> 
> Too many regions kill HBase.
> 
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [email protected]
> 
> ________________________________________
> From: Jacques [[email protected]]
> Sent: Wednesday, February 15, 2012 3:45 PM
> To: [email protected]
> Subject: Re: Scan performance on a big table as combination of multiple logic 
> tables
> 
> Out of curiosity,  what do you perceive as the benefit to having only one
> table?  Are there reasons that you think one table would perform better
> than a few?
> 
> If you're splitting data within a table because you'd otherwise have
> millions of tables, I understand that and would concur with Vladimir's
> approach below.  However, if you're really looking at 10 tables versus one
> table, it seems like HBase is built exactly to make that work well (rather
> than having to make all sorts of application level code to do what HBase
> already does).
> 
> thanks,
> Jacques
> 
> On Wed, Feb 15, 2012 at 1:57 PM, Pan, Thomas <[email protected]> wrote:
> 
>> 
>>  Since Hbase is tailored to handle one table very well, we are thinking to
>>  put multiple tables into one big table but on different column family sets.
>>  Our use case is full table scan against single column value filters. As
>>  records from different "logical tables" are at different column 
> families,
>>  could we speed up the scan performance by simply checking the column family
>>  referenced by these single column value filters first before really going
>>  through all the underlying K-V pairs? It would be great if the Hbase code
>>  is already coded that way.
>> 
>> 
>>  $0.02,
>>  Thomas
>> 
>> 
> 
> Confidentiality Notice:  The information contained in this message, including 
> any attachments hereto, may be confidential and is intended to be read only 
> by 
> the individual or entity to whom this message is addressed. If the reader of 
> this message is not the intended recipient or an agent or designee of the 
> intended recipient, please note that any review, use, disclosure or 
> distribution 
> of this message or its attachments, in any form, is strictly prohibited.  If 
> you 
> have received this message in error, please immediately notify the sender 
> and/or 
> [email protected] and delete or destroy any copy of this message 
> and 
> its attachments.
>

Reply via email to