saintstack commented on a change in pull request #3415:
URL: https://github.com/apache/hbase/pull/3415#discussion_r668073537



##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BalancerClusterState.java
##########
@@ -290,7 +294,9 @@ public String getRack(ServerName server) {
     }
 
     numTables = tables.size();
+    LOG.debug("number of tables = {}", numTables);

Review comment:
       nit: for next time, we capitalize log messages and if you look at other 
logs, there is no space around '=' when used in log messages.

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BalancerClusterState.java
##########
@@ -301,15 +307,26 @@ public String getRack(ServerName server) {
     for (int i = 0; i < regionIndexToServerIndex.length; i++) {
       if (regionIndexToServerIndex[i] >= 0) {
         
numRegionsPerServerPerTable[regionIndexToServerIndex[i]][regionIndexToTableIndex[i]]++;
+        numRegionsPerTable[regionIndexToTableIndex[i]]++;
       }
     }
 
-    numMaxRegionsPerTable = new int[numTables];
+    // Avoid repeated computation for planning
+    meanRegionsPerTable = new double[numTables];
+    regionSkewByTable = new double[numTables];
+    maxRegionSkewByTable  = new double[numTables];
+    minRegionSkewByTable = new double[numTables];
+
+    for (int i = 0; i < numTables; i++) {
+      meanRegionsPerTable[i] = Double.valueOf(numRegionsPerTable[i]) / 
numServers;
+      minRegionSkewByTable[i] += 
DoubleArrayCost.getMinSkew(numRegionsPerTable[i], numServers);
+      maxRegionSkewByTable[i] += 
DoubleArrayCost.getMaxSkew(numRegionsPerTable[i], numServers);
+    }
+
     for (int[] aNumRegionsPerServerPerTable : numRegionsPerServerPerTable) {
-      for (tableIndex = 0; tableIndex < aNumRegionsPerServerPerTable.length; 
tableIndex++) {
-        if (aNumRegionsPerServerPerTable[tableIndex] > 
numMaxRegionsPerTable[tableIndex]) {
-          numMaxRegionsPerTable[tableIndex] = 
aNumRegionsPerServerPerTable[tableIndex];
-        }
+      for (int tableIdx = 0; tableIdx < aNumRegionsPerServerPerTable.length; 
tableIdx++) {
+        regionSkewByTable[tableIdx] += 
Math.abs(aNumRegionsPerServerPerTable[tableIdx]
+          - meanRegionsPerTable[tableIdx]);

Review comment:
       nit: style. In the code base, we usually have operator on the end of the 
line rather than the start as it is here... When on the end of the line, the 
line looks to be 'hanging' so dev will continue reading... With this style, the 
dev might miss the continuation. Just style.

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/DoubleArrayCost.java
##########
@@ -106,4 +88,32 @@ private static double getSum(double[] stats) {
     }
     return total;
   }
+
+  /**
+   * Return the min skew of distribution
+   */
+  public static double getMinSkew(double total, double numServers) {

Review comment:
       nit: one way of addressing the nick comment would be to add javadoc for 
total that said what it was.....

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/BalancerClusterState.java
##########
@@ -671,22 +688,21 @@ void regionMoved(int region, int oldServer, int 
newServer) {
     int tableIndex = regionIndexToTableIndex[region];
     if (oldServer >= 0) {
       numRegionsPerServerPerTable[oldServer][tableIndex]--;
+      // update regionSkewPerTable for the move from old server
+      regionSkewByTable[tableIndex] +=
+        Math.abs(numRegionsPerServerPerTable[oldServer][tableIndex]
+          - meanRegionsPerTable[tableIndex])
+          - Math.abs(numRegionsPerServerPerTable[oldServer][tableIndex] + 1
+          - meanRegionsPerTable[tableIndex]);
     }
     numRegionsPerServerPerTable[newServer][tableIndex]++;
 
-    // check whether this caused maxRegionsPerTable in the new Server to be 
updated
-    if (numRegionsPerServerPerTable[newServer][tableIndex] > 
numMaxRegionsPerTable[tableIndex]) {
-      numMaxRegionsPerTable[tableIndex] = 
numRegionsPerServerPerTable[newServer][tableIndex];
-    } else if (oldServer >= 0 && 
(numRegionsPerServerPerTable[oldServer][tableIndex]
-        + 1) == numMaxRegionsPerTable[tableIndex]) {
-      // recompute maxRegionsPerTable since the previous value was coming from 
the old server
-      numMaxRegionsPerTable[tableIndex] = 0;
-      for (int[] aNumRegionsPerServerPerTable : numRegionsPerServerPerTable) {
-        if (aNumRegionsPerServerPerTable[tableIndex] > 
numMaxRegionsPerTable[tableIndex]) {
-          numMaxRegionsPerTable[tableIndex] = 
aNumRegionsPerServerPerTable[tableIndex];
-        }
-      }
-    }
+    // update regionSkewPerTable for the move to new server
+    regionSkewByTable[tableIndex] +=
+      Math.abs(numRegionsPerServerPerTable[newServer][tableIndex]
+        - meanRegionsPerTable[tableIndex])
+        - Math.abs(numRegionsPerServerPerTable[newServer][tableIndex] - 1
+        - meanRegionsPerTable[tableIndex]);

Review comment:
       nit: hard to read. You might do local variables just to make it easier 
in the future rather than this long line

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/DoubleArrayCost.java
##########
@@ -72,31 +72,13 @@ private static double computeCost(double[] stats) {
     double count = stats.length;
     double mean = total / count;
 
-    // Compute max as if all region servers had 0 and one had the sum of all 
costs. This must be
-    // a zero sum cost for this to make sense.
-    double max = ((count - 1) * mean) + (total - mean);
-
-    // It's possible that there aren't enough regions to go around
-    double min;
-    if (count > total) {
-      min = ((count - total) * mean) + ((1 - mean) * total);
-    } else {
-      // Some will have 1 more than everything else.
-      int numHigh = (int) (total - (Math.floor(mean) * count));
-      int numLow = (int) (count - numHigh);
-
-      min = (numHigh * (Math.ceil(mean) - mean)) + (numLow * (mean - 
Math.floor(mean)));
-
-    }
-    min = Math.max(0, min);

Review comment:
       Unused?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to