[
https://issues.apache.org/jira/browse/HBASE-13103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593220#comment-14593220
]
Mikhail Antonov commented on HBASE-13103:
-----------------------------------------
Yeah. We used to mention here that region has some "ideal" size and we should
try to get each region to this size, and I think we mentioned that ideal size
might be a fixed fraction of max size or something like that. May'be needs to
be more configurable.
I guess you assume here that every large table is supposed to be spread across
all RSs, and not just some subset (group?) of them? Also, to make sure I
understand right, when you say "250 regions per RS", you mean 250regions of
each table, or across all tables? Also this number of regions per RS.. I
suppose we can derive it dynamically like (max number of regions total in
cluster, as limited by AM performance, see issue about scaling to 1M regions) /
# of RS? Total max number of regions could be set in config,like 100k or 300k?
I'm thinking about roughly same logic for lower and upper ends (for lower end
another implicit threshold would be max size of each region, and for upper
limit I think there should be 2 more guards - 1) should check that total number
of regions doesn't approach the limits of AM and 2) we don't break table into
ridiculously small regions (less than N hdfs blocks?).
> [ergonomics] add region size balancing as a feature of master
> -------------------------------------------------------------
>
> Key: HBASE-13103
> URL: https://issues.apache.org/jira/browse/HBASE-13103
> Project: HBase
> Issue Type: Improvement
> Components: Balancer, Usability
> Reporter: Nick Dimiduk
> Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-13103-v0.patch, HBASE-13103-v1.patch
>
>
> Often enough, folks miss-judge split points or otherwise end up with a
> suboptimal number of regions. We should have an automated, reliable way to
> "reshape" or "balance" a table's region boundaries. This would be for tables
> that contain existing data. This might look like:
> {noformat}
> Admin#reshapeTable(TableName, int numSplits);
> {noformat}
> or from the shell:
> {noformat}
> > reshape TABLE, numSplits
> {noformat}
> Better still would be to have a maintenance process, similar to the existing
> Balancer that runs AssignmentManager on an interval, to run the above
> "reshape" operation on an interval. That way, the cluster will automatically
> self-correct toward a desirable state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)