[jira] [Updated] (HBASE-26311) Balancer gets stuck in cohosted replica distribution

Clara Xiong (Jira) Sun, 10 Oct 2021 15:36:11 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Clara Xiong updated HBASE-26311:
--------------------------------
    Description: 
In production, we found a corner case where balancer cannot make progress when 
there is cohosted replica. This is repro'ed on master branch using test added 
in HBASE-26310. The two cost functions isn't provide proper evaluation so 
balancer could make progress. 

 

  was:
In production, we found a corner case where balancer cannot make progress when 
there is cohosted replica. This is repro'ed on master branch using test added 
in HBASE-26310. The two cost functions isn't provide proper evaluation so 
balancer could make progress. 

 

Another observation is the imbalance weight is not updated by the cost 
functions properly during plan generation. The subsequent run reports much high 
imbalance.
{quote}2021-09-24 22:26:56,039 INFO 
org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Finished 
computing new moving plan. Computation took 2400001 ms to try 1284702 different 
iterations.  Found a solution that moves 6941 regions; Going from a computed 
imbalance of 6389.260497305375 to a new imbalance of 21.03904901349833. 

2021-09-24 22:33:40,961 INFO 
org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running 
balancer because at least one server hosts replicas of the same region.

2021-09-24 22:33:40,961 INFO 
org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start 
S*tocha*sticLoadBalancer.balancer, initial weighted average 
imbalance=6726.357026325619, functionCost=RegionCountSkewCostFunction : 
(multiplier=500.0, imbalance=0.07721156356401288); 
PrimaryRegionCountSkewCostFunction : (multiplier=500.0, 
imbalance=0.06298215530179263); MoveCostFunction : (multiplier=7.0, 
imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, 
imbalance=0.463289517245148); RackLocalityCostFunction : (multiplier=15.0, 
imbalance=0.25670928199727017); TableSkewCostFunction : (multiplier=500.0, 
imbalance=0.4378048676389543); RegionReplicaHostCostFunction : 
(multiplier=100000.0, imbalance=0.05809798270893372); 
RegionReplicaRackCostFunction : (multiplier=10000.0, 
imbalance=0.061018251681075886); ReadRequestCostFunction : (multiplier=5.0, 
imbalance=0.08235908576054465); WriteRequestCostFunction : (multiplier=5.0, 
imbalance=0.09385090828285425); MemStoreSizeCostFunction : (multiplier=5.0, 
imbalance=0.1327376982847744); StoreFileCostFunction : (multiplier=5.0, 
imbalance=0.07986594927573858);  computedMaxSteps=5579331200
{quote}
 


> Balancer gets stuck in cohosted replica distribution
> ----------------------------------------------------
>
>                 Key: HBASE-26311
>                 URL: https://issues.apache.org/jira/browse/HBASE-26311
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>            Priority: Major
>
> In production, we found a corner case where balancer cannot make progress 
> when there is cohosted replica. This is repro'ed on master branch using test 
> added in HBASE-26310. The two cost functions isn't provide proper evaluation 
> so balancer could make progress. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-26311) Balancer gets stuck in cohosted replica distribution

Reply via email to