[
https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Clara Xiong updated HBASE-26311:
--------------------------------
Description:
In production, we found a corner case where balancer cannot make progress when
there is cohosted replica. This is repro'ed on master branch using test added
in HBASE-26310. The two cost functions isn't provide proper evaluation so
balancer could make progress.
was:
In production, we found a corner case where balancer cannot make progress when
there is cohosted replica. This is repro'ed on master branch using test added
in HBASE-26310. The two cost functions isn't provide proper evaluation so
balancer could make progress.
Another observation is the imbalance weight is not updated by the cost
functions properly during plan generation. The subsequent run reports much high
imbalance.
{quote}2021-09-24 22:26:56,039 INFO
org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Finished
computing new moving plan. Computation took 2400001 ms to try 1284702 different
iterations. Found a solution that moves 6941 regions; Going from a computed
imbalance of 6389.260497305375 to a new imbalance of 21.03904901349833.
2021-09-24 22:33:40,961 INFO
org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running
balancer because at least one server hosts replicas of the same region.
2021-09-24 22:33:40,961 INFO
org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start
S*tocha*sticLoadBalancer.balancer, initial weighted average
imbalance=6726.357026325619, functionCost=RegionCountSkewCostFunction :
(multiplier=500.0, imbalance=0.07721156356401288);
PrimaryRegionCountSkewCostFunction : (multiplier=500.0,
imbalance=0.06298215530179263); MoveCostFunction : (multiplier=7.0,
imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0,
imbalance=0.463289517245148); RackLocalityCostFunction : (multiplier=15.0,
imbalance=0.25670928199727017); TableSkewCostFunction : (multiplier=500.0,
imbalance=0.4378048676389543); RegionReplicaHostCostFunction :
(multiplier=100000.0, imbalance=0.05809798270893372);
RegionReplicaRackCostFunction : (multiplier=10000.0,
imbalance=0.061018251681075886); ReadRequestCostFunction : (multiplier=5.0,
imbalance=0.08235908576054465); WriteRequestCostFunction : (multiplier=5.0,
imbalance=0.09385090828285425); MemStoreSizeCostFunction : (multiplier=5.0,
imbalance=0.1327376982847744); StoreFileCostFunction : (multiplier=5.0,
imbalance=0.07986594927573858); computedMaxSteps=5579331200
{quote}
> Balancer gets stuck in cohosted replica distribution
> ----------------------------------------------------
>
> Key: HBASE-26311
> URL: https://issues.apache.org/jira/browse/HBASE-26311
> Project: HBase
> Issue Type: Bug
> Components: Balancer
> Reporter: Clara Xiong
> Assignee: Clara Xiong
> Priority: Major
>
> In production, we found a corner case where balancer cannot make progress
> when there is cohosted replica. This is repro'ed on master branch using test
> added in HBASE-26310. The two cost functions isn't provide proper evaluation
> so balancer could make progress.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)