ayushtkn edited a comment on pull request #3940: URL: https://github.com/apache/hadoop/pull/3940#issuecomment-1035009857
@saintstack The path is actually relative: https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReport.java#L85-L90 For rename entries it is made absolute here: https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L471-L474 For normal delete there won't be any target, it would be always ``null``, so it is added just like that in normal cases. https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L465-L468 In this particular case when using filters. The actual entry is a ``RENAME`` entry which has target. Rename has to have a target. So, it takes this else block https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L254 And when converting it to a ``DELETE`` entry, it even adds the target. https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L262-L265 But since it is a delete entry the path isn't made absolute wrt target. So it stays like a relative path. like `filterDir1` and since it doesn't start with / and the normal logic by default it gets resolved to home directory. Then the code that you shared does the magic, it moves it... One example of target being set to ``null`` : https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L446-L447 May be there could be sanity check in delete diff for target, but not very confident about that part, will explore sometime if there is any use case possible where it can be not-null & compat stuff. Further general optimisations as well are possible, like don't rename to tmp and then delete, directly delete(There is a reason why it is like that), that is something in my TODO list, will chase in future General Info: Filters are like quite used in DR setups, some time we don't want to copy some data to replica clusters. One example could be Trash data, many other use cases as well. Lemme know if it isn't still convincing.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
