siddhantsangwan commented on code in PR #6305:
URL: https://github.com/apache/ozone/pull/6305#discussion_r1522691055


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java:
##########
@@ -856,19 +860,34 @@ private boolean moveContainer(DatanodeDetails source,
                     " {} failed: {}",
                 moveSelection.getContainerID(), source.getUuidString(),
                 moveSelection.getTargetNode().getUuidString(), result);
+            // add source back if move failed due to container related errors
+            if (result == 
MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE ||
+                result == 
MoveManager.MoveResult.REPLICATION_FAIL_EXIST_IN_TARGET ||
+                result == 
MoveManager.MoveResult.REPLICATION_FAIL_CONTAINER_NOT_CLOSED ||
+                result == 
MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_DELETION ||
+                result == 
MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_REPLICATION) {
+              findSourceStrategy.addBackSourceDataNode(source);
+            }

Review Comment:
   We need to handle the case where the scheduled move completes with a 
`MoveResult` other than `COMPLETED`, before the if...else code of `if 
(future.isDone())` gets executed. In this case, this line `return result == 
MoveManager.MoveResult.COMPLETED;` will evaluate to false and the source 
datanode will not get added back. However, instead of handling this case in the 
whenComplete block, I suggest handling it this way:
   ```
       if (future.isDone()) {
         if (future.isCompletedExceptionally()) {
           return false;
         } else {
           MoveManager.MoveResult result = future.join();
           moveSelectionToFutureMap.put(moveSelection, future);
           if (result == 
MoveManager.MoveResult.REPLICATION_FAIL_NOT_EXIST_IN_SOURCE ||
               result == 
MoveManager.MoveResult.REPLICATION_FAIL_EXIST_IN_TARGET ||
               result == 
MoveManager.MoveResult.REPLICATION_FAIL_CONTAINER_NOT_CLOSED ||
               result == 
MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_DELETION ||
               result == 
MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_REPLICATION) {
             findSourceStrategy.addBackSourceDataNode(source);
           }
           return result == MoveManager.MoveResult.COMPLETED;
         }
       } else {
         moveSelectionToFutureMap.put(moveSelection, future);
         return true;
       }
   ```
    
   This will achieve the same result and I think is a better way mainly because 
of thread safety issues. Any code in the `whenComplete` block can be called 
from a thread other than the balancing thread. This means we need to care about 
thread safety problems, such as concurrent accesses to the priority queue that 
`findSourceStrategy.addBackSourceDataNode(source)` is changing. We're better 
off avoiding this to prevent multithreading issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to