[
https://issues.apache.org/jira/browse/HBASE-26023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Clara Xiong updated HBASE-26023:
--------------------------------
Description:
There is another bug in the original tableSkew cost function for aggregation of
the cost per table:
If we have 10 regions, one per table, evenly distributed on 10 nodes, the cost
is scale to 1.0.
The more tables we have, the closer the value will be to 1.0. The cost function
becomes useless.
All the balancer tests were set up with large numbers of tables with minimal
regions per table. This artificially inflates the total cost and trigger
balancer runs. With this fix on TableSkewFunction, we need to overhaul the
tests too. We also need to add tests that reflect more diversified scenarios
for table distribution such as large tables with large numbers of regions.
{code:java}
protected double cost() {
double max = cluster.numRegions;
double min = ((double) cluster.numRegions) / cluster.numServers;
double value = 0;
for (int i = 0; i < cluster.numMaxRegionsPerTable.length; i++) {
value += cluster.numMaxRegionsPerTable[i];
}
LOG.info("min = {}, max = {}, cost= {}", min, max, value);
return scale(min, max, value);
}
}{code}
> Overhaul of test cluster set up for table skew
> ----------------------------------------------
>
> Key: HBASE-26023
> URL: https://issues.apache.org/jira/browse/HBASE-26023
> Project: HBase
> Issue Type: Sub-task
> Components: Balancer, test
> Environment: {code:java}
> {code}
> Reporter: Clara Xiong
> Priority: Major
>
> There is another bug in the original tableSkew cost function for aggregation
> of the cost per table:
> If we have 10 regions, one per table, evenly distributed on 10 nodes, the
> cost is scale to 1.0.
> The more tables we have, the closer the value will be to 1.0. The cost
> function becomes useless.
> All the balancer tests were set up with large numbers of tables with minimal
> regions per table. This artificially inflates the total cost and trigger
> balancer runs. With this fix on TableSkewFunction, we need to overhaul the
> tests too. We also need to add tests that reflect more diversified scenarios
> for table distribution such as large tables with large numbers of regions.
> {code:java}
> protected double cost() {
> double max = cluster.numRegions;
> double min = ((double) cluster.numRegions) / cluster.numServers;
> double value = 0;
> for (int i = 0; i < cluster.numMaxRegionsPerTable.length; i++) {
> value += cluster.numMaxRegionsPerTable[i];
> }
> LOG.info("min = {}, max = {}, cost= {}", min, max, value);
> return scale(min, max, value);
> }
> }{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)