siddhantsangwan commented on code in PR #6305:
URL: https://github.com/apache/ozone/pull/6305#discussion_r1547327496
##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestContainerBalancerTask.java:
##########
@@ -1114,6 +1114,35 @@ public void
balancerShouldExcludeECContainersWhenLegacyRmIsEnabled()
}
}
+ @Test
Review Comment:
Let's add java doc and comments explaining what this test is doing as it may
not be obvious to someone else.
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/FindSourceGreedy.java:
##########
@@ -138,6 +138,14 @@ public void removeCandidateSourceDataNode(DatanodeDetails
dui) {
potentialSources.removeIf(a -> a.getDatanodeDetails().equals(dui));
}
+ @Override
+ public void addBackSourceDataNode(DatanodeDetails dn) {
+ DatanodeUsageInfo dui = nodeManager.getUsageInfo(dn);
+ if (!potentialSources.contains(dui)) {
Review Comment:
We don't want to pay the performance cost of `contains` for a priority queue
every time we're adding back a source. Let's avoid this check.
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/FindSourceStrategy.java:
##########
@@ -45,6 +45,16 @@ public interface FindSourceStrategy {
*/
void removeCandidateSourceDataNode(DatanodeDetails dui);
+ /**
+ * add the specified data node to the candidate source
+ * data nodes.
+ * Caller of this method must ensure that Datanode is
+ * not currently a candidate source
Review Comment:
```suggestion
* This method does not check whether the specified Datanode is already
present in the Collection. Callers must take the responsibility of checking and
removing the Datanode before adding, if required.
```
##########
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestContainerBalancerTask.java:
##########
@@ -1114,6 +1114,35 @@ public void
balancerShouldExcludeECContainersWhenLegacyRmIsEnabled()
}
}
+ @Test
+ public void testSourceDatanodeAddedBack()
+ throws NodeNotFoundException, IOException,
IllegalContainerBalancerStateException,
+ InvalidContainerBalancerConfigurationException, TimeoutException,
InterruptedException {
+
+ when(moveManager.move(any(ContainerID.class),
+ any(DatanodeDetails.class),
+ any(DatanodeDetails.class)))
+
.thenReturn(CompletableFuture.completedFuture(MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE))
+
.thenReturn(CompletableFuture.completedFuture(MoveManager.MoveResult.COMPLETED));
+ balancerConfiguration.setThreshold(10);
+ balancerConfiguration.setIterations(1);
+ balancerConfiguration.setMaxSizeEnteringTarget(10 * STORAGE_UNIT);
+ balancerConfiguration.setMaxSizeToMovePerIteration(100 * STORAGE_UNIT);
+ balancerConfiguration.setMaxDatanodesPercentageToInvolvePerIteration(100);
+ String includeNodes =
nodesInCluster.get(0).getDatanodeDetails().getHostName() + "," +
+ nodesInCluster.get(nodesInCluster.size() -
1).getDatanodeDetails().getHostName();
+ balancerConfiguration.setIncludeNodes(includeNodes);
+
+ startBalancer(balancerConfiguration);
+ GenericTestUtils.waitFor(() ->
ContainerBalancerTask.IterationResult.ITERATION_COMPLETED ==
+ containerBalancerTask.getIterationResult(), 10, 50);
+
+ assertEquals(2,
containerBalancerTask.getCountDatanodesInvolvedPerIteration());
Review Comment:
Let's also assert that the two DNs involved are the ones we expect
(specified in includeNodes).
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java:
##########
@@ -881,6 +896,14 @@ private boolean moveContainer(DatanodeDetails source,
} else {
MoveManager.MoveResult result = future.join();
moveSelectionToFutureMap.put(moveSelection, future);
+ if (result ==
MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE ||
+ result == MoveManager.MoveResult.REPLICATION_FAIL_EXIST_IN_TARGET
||
+ result ==
MoveManager.MoveResult.REPLICATION_FAIL_CONTAINER_NOT_CLOSED ||
+ result ==
MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_DELETION ||
+ result ==
MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_REPLICATION) {
+ findSourceStrategy.addBackSourceDataNode(source);
+
selectionCriteria.addToExcludeDueToFailContainers(moveSelection.getContainerID());
Review Comment:
We don't need to exclude the container here I think. These seem to be
replica specific and intermittent failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]