Frens Jan Rumph created HBASE-28820:
---------------------------------------

             Summary: TableSkew cost scales beyond 1
                 Key: HBASE-28820
                 URL: https://issues.apache.org/jira/browse/HBASE-28820
             Project: HBase
          Issue Type: Bug
          Components: Balancer
    Affects Versions: 2.5.7
         Environment: We experienced the issue with Apache HBase 2.5.7 on 
Apache Hadoop 3.3.6 using Java 17 on Debian 12 (Bookworm).
            Reporter: Frens Jan Rumph


This may already be covered by later releases, but we noticed that the table 
skew cost function can produce cost values beyond 1. In our case with over 1000 
tables caused the table skew cost to suppress the region count skew (and other) 
cost functions.

I think this is because the cost per table are 'simply' summed in 
TableSkewCostFunction#cost. So if the number of tables with skew is large, this 
cost function may cause the balancer to favour actions that decrease this cost 
at to big of an expense of other costs such as region count skew.

Logging from the HBase master that shows this:

 
{code:java}
[...] balancer.StochasticLoadBalancer: dBalancer.balancer, initial weighted 
average imbalance=0.25500371101846336, functionCost=RegionCountSkewCostFunction 
: (multiplier=100000.0, imbalance=0.24272066309658274, need balance); 
PrimaryRegionCountSkewCostFunction : (not needed); MoveCostFunction : 
(multiplier=7.0, imbalance=0.0); ServerLocalityCostFunction : (multiplier=25.0, 
imbalance=0.6022498608833904, need balance); RackLocalityCostFunction : 
(multiplier=15.0, imbalance=0.0); TableSkewCostFunction : (multiplier=35.0, 
imbalance=35.24784226006047, need balance); RegionReplicaHostCostFunction : 
(not needed); RegionReplicaRackCostFunction : (not needed); 
ReadRequestCostFunction : (multiplier=5.0, imbalance=0.24057323733439073, need 
balance); WriteRequestCostFunction : (multiplier=5.0, 
imbalance=0.3233739875438904, need balance); MemStoreSizeCostFunction : 
(multiplier=5.0, imbalance=0.3195880383071082, need balance); 
StoreFileCostFunction : (multiplier=5.0, imbalance=0.23335375436276784, need 
balance);  computedMaxSteps=1000000 {code}
Note the {{TableSkewCostFunction : (multiplier=35.0, 
imbalance=35.24784226006047)}} part.

In order to work-around this we temporarily reduced the multiplier of the table 
skew cost function to 0.

The test case below fails on HBase the 2.5 and 2.6 branches. It simply assigns 
two tables with two regions each to a single server.

 
{code:java}
@Test
public void testTableSkewCost() {
  TableName t1 = TableName.valueOf("t1");
  TableName t2 = TableName.valueOf("t2");

  TreeMap<ServerName, List<RegionInfo>> clusterState = new TreeMap<>();
  clusterState.put(ServerName.valueOf("n1", 16020,0), Arrays.asList(
    RegionInfoBuilder.newBuilder(t1).setRegionId(11).build(),
    RegionInfoBuilder.newBuilder(t1).setRegionId(12).build()
  ));
  clusterState.put(ServerName.valueOf("n2", 16020,0), Arrays.asList(
    RegionInfoBuilder.newBuilder(t2).setRegionId(21).build(),
    RegionInfoBuilder.newBuilder(t2).setRegionId(22).build()
  ));

  BalancerClusterState cluster = new BalancerClusterState(clusterState, null, 
null, null);

  Configuration conf = HBaseConfiguration.create();
  CostFunction costFunction = new TableSkewCostFunction(conf);
  costFunction.prepare(cluster);
  double cost = costFunction.cost();
  assertTrue(cost >= 0);
  assertTrue(cost <= 1.01);
} {code}
 

It's the second assertion that fails since the computed cost for this cluster 
state is 2.

I guess none of the existing cluster state test/mock configurations have a real 
table skew.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to