sodonnel commented on a change in pull request #3234:
URL: https://github.com/apache/hadoop/pull/3234#discussion_r676688811



##########
File path: 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
##########
@@ -515,6 +521,26 @@ private DiffInfo getRenameItem(DiffInfo diff, DiffInfo[] 
renameDiffArray) {
     return null;
   }
 
+  /**
+   * checks if a parent dir is marked deleted as a part of dir rename happening
+   * to a path which is excluded by the the filter.
+   * @return true if it's marked deleted
+   */
+  private boolean isParentOrSelfMarkedDeleted(DiffInfo diff,
+      DiffInfo[] deletedDirDiffArray) {
+    for (DiffInfo item : deletedDirDiffArray) {
+      if (item.getSource().equals(diff.getSource())) {
+        if (diff.getType() == SnapshotDiffReport.DiffType.MODIFY) {
+          return true;
+        }
+      }
+      if (isParentOf(item.getSource(), diff.getSource())) {

Review comment:
       How does this code distinguish between this scenario:
   
   ```
   create /foo/dir/f1
   delete /foo/dir
   ```
   
   And:
   
   ```
   create /foo/dir/f1
   delete /foo/dir
   create /foo/dir  ** This will be a create in the diff be retained by the 
MODIFY filter above if item.getSource == diff.getSource.
   create /foo/dir/f1 ** This second case should be kept, but it will match the 
parent of the deleted /foo/dir, so what stops it getting excluded?
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to