sodonnel commented on a change in pull request #2903:
URL: https://github.com/apache/ozone/pull/2903#discussion_r766200831
##########
File path:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/replication/SimpleContainerDownloader.java
##########
@@ -74,38 +74,28 @@ public SimpleContainerDownloader(
public CompletableFuture<Path> getContainerDataFromReplicas(
long containerId, List<DatanodeDetails> sourceDatanodes) {
- CompletableFuture<Path> result = null;
-
final List<DatanodeDetails> shuffledDatanodes =
shuffleDatanodes(sourceDatanodes);
for (DatanodeDetails datanode : shuffledDatanodes) {
try {
- if (result == null) {
- result = downloadContainer(containerId, datanode);
- } else {
-
- result = result.exceptionally(t -> {
- LOG.error("Error on replicating container: " + containerId, t);
- try {
- return downloadContainer(containerId, datanode).get();
- } catch (ExecutionException | IOException e) {
- LOG.error("Error on replicating container: " + containerId,
- e);
- } catch (InterruptedException e) {
- Thread.currentThread().interrupt();
- }
- return null;
- });
Review comment:
I'm not sure. I looked at the original code for a long time to try to
understand it, which is the problem I am trying to solve. I don't think its
intention / logic is clear. This is the original code:
```
@Override
public CompletableFuture<Path> getContainerDataFromReplicas(
long containerId, List<DatanodeDetails> sourceDatanodes) {
CompletableFuture<Path> result = null;
final List<DatanodeDetails> shuffledDatanodes =
shuffleDatanodes(sourceDatanodes);
for (DatanodeDetails datanode : shuffledDatanodes) {
try {
if (result == null) {
result = downloadContainer(containerId, datanode);
} else {
result = result.exceptionally(t -> {
LOG.error("Error on replicating container: " + containerId, t);
try {
return downloadContainer(containerId, datanode).get();
} catch (ExecutionException | IOException e) {
LOG.error("Error on replicating container: " + containerId,
e);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
return null;
});
}
} catch (Exception ex) {
LOG.error(String.format(
"Container %s download from datanode %s was unsuccessful. "
+ "Trying the next datanode", containerId, datanode), ex);
}
}
return result;
```
My reading of it:
* On the first pass of the loop, result will be null so the download is
attempted via the future. However we don't check for completion, so the loop
must go into a second pass.
* On the second pass (assuming there is more than 1 DN). Result will be
non-null, so it will call `result.exceptionally()`. Reading the java doc for
it, which is confusing to say the least, I think it will "complete" the future
if it was successful, or run the lambda if not. At this stage, the lambda is
using the second DN in the list rather than the first, so the retry will be on
the second DN, not a second try on the first.
* If there is a 3rd DN in the list, I think it will still loop again, as
there is no break in the loop and no return (aside from the one in the lambda),
but this time result is non-null and completed if the second pass worked, so I
am guessing calling exceptionally on it does nothing, but I am not certain.
* Then the loop exits and the method returns.
If there is only a single DN in the list, then I think it doesn't actually
ever call exceptionally, so the future doesn't get completed until the method
returns in `DownloadAndImportReplicator.replicate()` where we call "get" on the
future.
I think the new code performs the same number of retries, but I hope, in an
easier to follow way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]