[
https://issues.apache.org/jira/browse/HBASE-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216932#comment-15216932
]
Sean Busbey commented on HBASE-15547:
-------------------------------------
it needn't even be that a RS can't open regions at all. just that some RS might
severely lag others for a variety of reasons. If one of our goals is to keep
MTTR reasonable, then a RS that is anomalous in how long it takes to get things
open should be getting less work.
> Balancer should take into account number of PENDING_OPEN regions
> ----------------------------------------------------------------
>
> Key: HBASE-15547
> URL: https://issues.apache.org/jira/browse/HBASE-15547
> Project: HBase
> Issue Type: Improvement
> Components: Balancer, Operability, Region Assignment
> Affects Versions: 0.98.0, 1.0.0
> Reporter: Sean Busbey
> Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
>
> We recently had a cluster get into a bad state where a subset of region
> servers consistently could not open new regions (but could continue serving
> the regions they already hosted).
> Recovering the cluster was just a matter of restarting region servers in
> sequence. However, this led to things getting substantially worse before they
> got better since the bulk assigner continued to place an uniform number of
> recovered regions across all servers, including onto those that could not
> open regions.
> It would be useful if the balancer could penalize regionservers with a
> backlog of pending_open regions and place more work on those regionservers
> that are properly serving.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)