ndimiduk commented on a change in pull request #2003:
URL: https://github.com/apache/hbase/pull/2003#discussion_r452546679



##########
File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancerRegionReplica.java
##########
@@ -168,6 +205,23 @@ public void testNeedsBalanceForColocatedReplicas() {
         new Cluster(map, null, null, new ForTestRackManagerOne())));
   }
 
+  @Test
+  public void testLocalityCost() throws Exception {
+    Configuration conf = HBaseConfiguration.create();
+    MockNoopMasterServices master = new MockNoopMasterServices();
+    StochasticLoadBalancer.CostFunction
+      costFunction = new 
StochasticLoadBalancer.ServerLocalityCostFunction(conf, master);
+
+    for (int test = 0; test < clusterRegionLocationMocks.length; test++) {
+      int[][] clusterRegionLocations = clusterRegionLocationMocks[test];
+      MockCluster cluster = new MockCluster(clusterRegionLocations, true);
+      costFunction.init(cluster);
+      double cost = costFunction.cost();
+      double expected = 1 - expectedLocalities[test];
+      assertEquals(expected, cost, 0.001);

Review comment:
       nit: How about adding a message saying which test is being run, so that 
debugging a failure is easier? Better still would be to break the two tests 
into two test methods.

##########
File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancerRegionReplica.java
##########
@@ -49,6 +51,41 @@
   public static final HBaseClassTestRule CLASS_RULE =
       
HBaseClassTestRule.forClass(TestStochasticLoadBalancerRegionReplica.class);
 
+  // Mapping of locality test -> expected locality
+  private float[] expectedLocalities = {1.0f, 0.5f};
+
+  /**
+   * Data set for testLocalityCost:

Review comment:
       Perhaps there's a more legible representation of cluster locality than 
these jagged arrays?

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##########
@@ -1026,8 +1027,11 @@ void init(Cluster cluster) {
       }
 
       for (int region = 0; region < cluster.numRegions; region++) {
-        locality += getWeightedLocality(region, 
regionIndexToEntityIndex(region));
-        bestLocality += getWeightedLocality(region, 
getMostLocalEntityForRegion(region));
+        // Skip Replica regions, data locality should not be a factor for 
replica regions.
+        if (RegionReplicaUtil.isDefaultReplica(this.cluster.regions[region])) {

Review comment:
       I like the inverted check you do elsewhere, so you say
   ```
   if (!isDefaulReplica(cluster.regions[region])) {
     continue;
   }
   ```

##########
File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
##########
@@ -174,6 +174,46 @@ public void reloadCachedMappings(List<String> arg0) {
     }
   }
 
+  // This mock allows us to test the LocalityCostFunction
+  public class MockCluster extends BaseLoadBalancer.Cluster {
+
+    private int[][] localities = null;   // [region][server] = percent of 
blocks
+    private boolean firstRegionAsReplica;
+
+    public MockCluster(int[][] regions) {
+      this(regions, false);
+    }
+
+    public MockCluster(int[][] regions, final boolean firstRegionAsReplica) {

Review comment:
       This mocking is completely inscrutable to me, but I guess you simply 
moved it from `TestStochasticLoadBalancer`.

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##########
@@ -1462,8 +1473,14 @@ protected double getCostFromRl(BalancerRegionLoad rl) {
     }
 
     @Override
-    protected double getCostFromRl(BalancerRegionLoad rl) {
-      return rl.getStorefileSizeMB();
+    protected double getCostFromRl(BalancerRegionLoad rl, boolean 
isPrimaryRegion) {
+      // Do not count replica region's file size, as replica regions serve 
very little
+      // read requests, this may be changed if there are enough data from 
production showing

Review comment:
       I don't understand this comment "as replica regions serve very little 
read requests." All a replica does is serve read requests! Maybe you're trying 
to say "a replica region by definition contributes nothing to store file load 
on the region that hosts it."
   
   To this, I think it depends on how this value is being considered. This 
store file size has a real impact on IO channels and BlockCache; just 
happenstance impact on local storage IO channels. Notably, compactions impose 
no load for a replica region. I think it makes sense to continue to consider 
the store file size of read replicas, to maybe at a reduced weight. Omitting 
them entirely only makes sense if we can somehow assume that all region servers 
will serve an equal load of replicas, serving equal work, which is not 
realistic.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to