[GitHub] [ozone] sodonnel commented on a change in pull request #2903: HDDS-6080. Simplify the logic in SimpleContainerDownloader.getContainerDataFromReplicas

GitBox Thu, 09 Dec 2021 14:23:35 -0800


sodonnel commented on a change in pull request #2903:
URL: https://github.com/apache/ozone/pull/2903#discussion_r766200831




##########
File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/SimpleContainerDownloader.java
##########
@@ -74,38 +74,28 @@ public SimpleContainerDownloader(
   public CompletableFuture<Path> getContainerDataFromReplicas(
       long containerId, List<DatanodeDetails> sourceDatanodes) {
 
-    CompletableFuture<Path> result = null;
-
     final List<DatanodeDetails> shuffledDatanodes =
         shuffleDatanodes(sourceDatanodes);
 
     for (DatanodeDetails datanode : shuffledDatanodes) {
       try {
-        if (result == null) {
-          result = downloadContainer(containerId, datanode);
-        } else {
-
-          result = result.exceptionally(t -> {
-            LOG.error("Error on replicating container: " + containerId, t);
-            try {
-              return downloadContainer(containerId, datanode).get();
-            } catch (ExecutionException | IOException e) {
-              LOG.error("Error on replicating container: " + containerId,
-                  e);
-            } catch (InterruptedException e) {
-              Thread.currentThread().interrupt();
-            }
-            return null;
-          });

Review comment:
       I'm not sure. I looked at the original code for a long time to try to 
understand it, which is the problem I am trying to solve. I don't think its 
intention / logic is clear. This is the original code:
   
   ```
     @Override
     public CompletableFuture<Path> getContainerDataFromReplicas(
         long containerId, List<DatanodeDetails> sourceDatanodes) {
   
       CompletableFuture<Path> result = null;
   
       final List<DatanodeDetails> shuffledDatanodes =
           shuffleDatanodes(sourceDatanodes);
   
       for (DatanodeDetails datanode : shuffledDatanodes) {
         try {
           if (result == null) {
             result = downloadContainer(containerId, datanode);
           } else {
   
             result = result.exceptionally(t -> {
               LOG.error("Error on replicating container: " + containerId, t);
               try {
                 return downloadContainer(containerId, datanode).get();
               } catch (ExecutionException | IOException e) {
                 LOG.error("Error on replicating container: " + containerId,
                     e);
               } catch (InterruptedException e) {
                 Thread.currentThread().interrupt();
               }
               return null;
             });
           }
         } catch (Exception ex) {
           LOG.error(String.format(
               "Container %s download from datanode %s was unsuccessful. "
                   + "Trying the next datanode", containerId, datanode), ex);
         }
       }
       return result;
   ```
   
   My reading of it:
   
    * On the first pass of the loop, result will be null so the download is 
attempted via the future. However we don't check for completion, so the loop 
must go into a second pass.
    * On the second pass (assuming there is more than 1 DN). Result will be 
non-null, so it will call `result.exceptionally()`. Reading the java doc for 
it, which is confusing to say the least, I think it will "complete" the future 
if it was successful, or run the lambda if not. At this stage, the lambda is 
using the second DN in the list rather than the first, so the retry will be on 
the second DN, not a second try on the first.
    * If there is a 3rd DN in the list, I think it will still loop again, as 
there is no break in the loop and no return (aside from the one in the lambda), 
but this time result is non-null and completed if the second pass worked, so I 
am guessing calling exceptionally on it does nothing, but I am not certain.
    * Then the loop exits and the method returns.
    
    If there is only a single DN in the list, then I think it doesn't actually 
ever call exceptionally, so the future doesn't get completed until the method 
returns in `DownloadAndImportReplicator.replicate()` where we call "get" on the 
future.
   
   I think the new code performs the same number of retries, but I hope, in an 
easier to follow way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] sodonnel commented on a change in pull request #2903: HDDS-6080. Simplify the logic in SimpleContainerDownloader.getContainerDataFromReplicas

Reply via email to