[
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938578#comment-15938578
]
Kahlil Oppenheimer edited comment on HBASE-17707 at 3/23/17 3:34 PM:
---------------------------------------------------------------------
bq. We cannot maintain two different cost functions for table skew. Let's
remove the old one from the code, and only have the new implementation in this
patch. We cannot have dead code lying around and rot. We can close HBASE-17706
as won't fix.
I will add the removal of this old cost function to my patch.
bq. The new candidate generator TableSkewCandidateGenerator is not added to the
SLB::candidateGenerators field which means that it is not used? I can only see
the test using it. Is this intended? It has to be enabled by default.
Good catch on the table skew candidate generator. I will also add that to the
patch as well. I was originally going to do it in a separate patch, but it
makes much more sense to just do it here.
bq. Did you intend to use the raw variable here instead of calling scale again:
Yup! Let's call R the range [0, 1]. We know that scale() maps values into R. We
also know that sqrt() maps values from R -> R. Lastly, we know that .9 * r + .1
for any r in R yields another value in R. So can be sure the outcome is in R.
No need to call scale function :).
Before opening the patch, I'm just repeatedly running the tests 100s of times
to feel more confident I haven't missed edge cases since a lot of these test
failures are very non-deterministic.
was (Author: kahliloppenheimer):
bq. We cannot maintain two different cost functions for table skew. Let's
remove the old one from the code, and only have the new implementation in this
patch. We cannot have dead code lying around and rot. We can close HBASE-17706
as won't fix.
I will add the removal of this old cost function to my patch.
bq. The new candidate generator TableSkewCandidateGenerator is not added to the
SLB::candidateGenerators field which means that it is not used? I can only see
the test using it. Is this intended? It has to be enabled by default.
Good catch on the table skew candidate generator. I will also add that to the
patch as well. I was originally going to do it in a separate patch, but it
makes much more sense to just do it here.
bq. Did you intend to use the raw variable here instead of calling scale again:
Yup! Let's call R the range [0, 1]. We know that scale() maps values into R. We
also know that sqrt() maps values from R -> R. Lastly, we know that .9 * r + .1
for any r in R yields another value in R. So can be sure the outcome is in R.
No need to call scale function :).
> New More Accurate Table Skew cost function/generator
> ----------------------------------------------------
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
> Issue Type: New Feature
> Components: Balancer
> Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
> Reporter: Kahlil Oppenheimer
> Assignee: Kahlil Oppenheimer
> Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch,
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch,
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch,
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch,
> HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal
> number of region moves required for a given table to perfectly balance the
> table across the cluster (i.e. as if the regions from that table had been
> round-robin-ed across the cluster). This number of moves is computer for each
> table, then normalized to a score between 0-1 by dividing by the number of
> moves required in the absolute worst case (i.e. the entire table is stored on
> one server), and stored in an array. The cost function then takes a weighted
> average of the average and maximum value across all tables. The weights in
> this average are configurable to allow for certain users to more strongly
> penalize situations where one table is skewed versus where every table is a
> little bit skewed. To better spread this value more evenly across the range
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize
> the above TableSkewCostFunction. It first simply tries to move regions until
> each server has the right number of regions, then it swaps regions around
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with
> 100s of TBs of data and 100s of tables across dozens of servers and found
> both to be very performant and accurate.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)