[ 
https://issues.apache.org/jira/browse/HBASE-13103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484867#comment-14484867
 ] 

Mikhail Antonov commented on HBASE-13103:
-----------------------------------------

[~phobos182] thanks for feedback! Very useful. I guess I have a lot of 
questions I'd like to ask, if you don't mind, to better understand the real 
needs.

bq.  Given the time difference between when the commands were run, this could 
end up with different region boundaries between the clusters – which is not 
desired. So I second the idea of generates "reshaping plan" so it can be 
applied in the same manner on the slave cluster.

 - How strictly consistent are you master and slave clusters? How much can they 
diverge? Is second cluster mostly for long-running analytics, which only dumps 
output in some other table?
 - So you don't have automatic splits now, as I understand, only pre-split 
tables? Otherwise how are you ensuring that the region boundaries are exactly 
the same? What's the avg region size?
-  Do you want region boundaries to be exactly the same, or approximately the 
same?

Current patch has notion of "reshaping plan", which includes params like split 
point (currently not computed though :) ).  It'd be feasible to send these 
plans to normalizer on the other side (or rather, expose normalize() call, 
which accepts serialized reshaping plan, in master rpc services, but, the 
region names wouldn't be the same anyway)

bq. Probably should think about performing a major compaction operation before 
the normalize policy runs.
Yeah, that makes sense. Though I think most people run major compactions 
infrequently, so making this prerequisite would change that operational 
practice? How often do you run major compactions?

> [ergonomics] add region size balancing as a feature of master
> -------------------------------------------------------------
>
>                 Key: HBASE-13103
>                 URL: https://issues.apache.org/jira/browse/HBASE-13103
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Usability
>            Reporter: Nick Dimiduk
>            Assignee: Mikhail Antonov
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: HBASE-13103-v0.patch
>
>
> Often enough, folks miss-judge split points or otherwise end up with a 
> suboptimal number of regions. We should have an automated, reliable way to 
> "reshape" or "balance" a table's region boundaries. This would be for tables 
> that contain existing data. This might look like:
> {noformat}
> Admin#reshapeTable(TableName, int numSplits);
> {noformat}
> or from the shell:
> {noformat}
> > reshape TABLE, numSplits
> {noformat}
> Better still would be to have a maintenance process, similar to the existing 
> Balancer that runs AssignmentManager on an interval, to run the above 
> "reshape" operation on an interval. That way, the cluster will automatically 
> self-correct toward a desirable state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to