[ 
https://issues.apache.org/jira/browse/HBASE-13103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592689#comment-14592689
 ] 

Nick Dimiduk commented on HBASE-13103:
--------------------------------------

Left some comments on RB.

Further thinking about the use-case, this chore is aiming for an ideal state of 
even cluster utilization. We seem to think of this in terms of (1) evenly 
distributed load and (2) region servers are not hosting more regions than they 
can hold -- regions are sized "just right". We assume schema design results in 
a natural application load over keys, so (1) can be approximated by uniform 
region size and count. Uniform count/server is handled by the Balancer, which 
leaves the Normalizer to worry about overall count and size. Too few overall 
and you have unused hosts ("i just stood up a 10 node cluster but only one host 
is doing work!"), too many and you end up with 1k regions/server.

At the lower end, we probably want to split relatively empty tables toward a 
goal of {{# of regions = 2x number of region servers}}. Or maybe 3x or 5x?

At the upper end, we want to push toward a target of ~250 regions per region 
server and those regions being of uniform size if possible.

> [ergonomics] add region size balancing as a feature of master
> -------------------------------------------------------------
>
>                 Key: HBASE-13103
>                 URL: https://issues.apache.org/jira/browse/HBASE-13103
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer, Usability
>            Reporter: Nick Dimiduk
>            Assignee: Mikhail Antonov
>             Fix For: 2.0.0, 1.2.0
>
>         Attachments: HBASE-13103-v0.patch, HBASE-13103-v1.patch
>
>
> Often enough, folks miss-judge split points or otherwise end up with a 
> suboptimal number of regions. We should have an automated, reliable way to 
> "reshape" or "balance" a table's region boundaries. This would be for tables 
> that contain existing data. This might look like:
> {noformat}
> Admin#reshapeTable(TableName, int numSplits);
> {noformat}
> or from the shell:
> {noformat}
> > reshape TABLE, numSplits
> {noformat}
> Better still would be to have a maintenance process, similar to the existing 
> Balancer that runs AssignmentManager on an interval, to run the above 
> "reshape" operation on an interval. That way, the cluster will automatically 
> self-correct toward a desirable state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to