[
https://issues.apache.org/jira/browse/HBASE-12829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
cuijianwei updated HBASE-12829:
-------------------------------
Summary: Request count in RegionLoad may not accurate to compute the load
cost for region (was: Request count in RegionLoad may not accurate to compute
the region load cost)
> Request count in RegionLoad may not accurate to compute the load cost for
> region
> --------------------------------------------------------------------------------
>
> Key: HBASE-12829
> URL: https://issues.apache.org/jira/browse/HBASE-12829
> Project: HBase
> Issue Type: Improvement
> Components: Balancer
> Affects Versions: 0.99.2
> Reporter: cuijianwei
> Priority: Minor
>
> StochasticLoadBalancer#RequestCostFunction(ReadRequestCostFunction and
> WriteRequestCostFunction) will compute load cost for a region based on a
> number of remembered region loads. Each region load records the total count
> for read/write request at reported time since it opened. However, the request
> count will be reset if region moved, making the new reported count could not
> represent the total request. For example, if a region has high write
> throughput, the WrtieRequest in region load will be very big after onlined
> for a long time, then if the region moved, the new WriteRequest will be much
> smaller, making the region contributes much smaller to the cost of its
> belonging rs. We may need to consider the region open time to get more
> accurate region load.
> As another way, how about using read/write request count at each time slots
> instead of total request count? The total count will make older read/write
> request throughput contribute more to the cost by
> CostFromRegionLoadFunction#getRegionLoadCost:
> {code}
> protected double getRegionLoadCost(Collection<RegionLoad> regionLoadList)
> {
> double cost = 0;
> for (RegionLoad rl : regionLoadList) {
> double toAdd = getCostFromRl(rl);
> if (cost == 0) {
> cost = toAdd;
> } else {
> cost = (.5 * cost) + (.5 * toAdd);
> }
> }
> return cost;
> }
> {code}
> For example, assume the balancer now remembers three loads for a region at
> time t1, t2, t3(t1 < t2 < t3), the write request is w1, w2, w3 respectively
> for time slots [0, t1), [t1, t2), [t2, t3), so the WriteRequest in the region
> load at t1, t2, t3 will be w1, w1 + w2, w1 + w2 + w3 and the WriteRequest
> cost will be:
> {code}
> 0.5 * (w1 + w2 + w3) + 0.25 * (w1 + w2) + 0.25 * w1 = w1 + 0.75 * w2 +
> 0.5 * w3
> {code}
> The w1 contributes more to the cost than w2 and w3. However, intuitively, I
> think the recent read/write throughput should represent the current load of
> the region better than the older ones. Therefore, how about using w1, w2 and
> w3 directly when computing? Then, the cost will become:
> {code}
> 0.25 * w1 + 0.25 * w2 + 0.5 * w3
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)