[
https://issues.apache.org/jira/browse/HADOOP-18096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490514#comment-17490514
]
Ayush Saxena commented on HADOOP-18096:
---------------------------------------
Just copying the comment from the PR for posterity:
The path is actually relative:
hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReport.java
Lines 85 to 90 in efdec92
/**
* The relative path (related to the snapshot root) of 1) the file/directory
* where changes have happened, or 2) the source file/dir of a rename op.
*/
private final byte[] sourcePath;
private final byte[] targetPath;
For rename entries it is made absolute here:
hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
Lines 471 to 474 in efdec92
for (DiffInfo diff : diffMap.get(SnapshotDiffReport.DiffType.RENAME)) {
Path source = new Path(targetDir, diff.getSource());
Path target = new Path(targetDir, diff.getTarget());
renameAndDeleteDiff.add(new DiffInfo(source, target, diff.getType()));
For normal delete there won't be any target, it would be always null, so it is
added just like that in normal cases.
hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
Lines 465 to 468 in efdec92
for (DiffInfo diff : diffMap.get(SnapshotDiffReport.DiffType.DELETE))
{ Path source = new Path(targetDir, diff.getSource());
renameAndDeleteDiff.add(new DiffInfo(source, diff.getTarget(),
diff.getType())); In this particular case when using filters. The actual entry
is a RENAME entry which has target. Rename has to have a target. So, it takes
this else block
hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
Line 254 in efdec92 }
else if (dt == SnapshotDiffReport.DiffType.RENAME) {
And when converting it to a DELETE entry, it even adds the target.
hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
Lines 262 to 265 in efdec92
list = diffMap.get(SnapshotDiffReport.DiffType.DELETE);
DiffInfo info = new DiffInfo(source, target,
SnapshotDiffReport.DiffType.DELETE);
list.add(info);
But since it is a delete entry the path isn't made absolute wrt target. So it
stays like a relative path. like filterDir1 and since it doesn't start with /
and the normal logic by default it gets resolved to home directory.
Then the code that you shared does the magic, it moves it...
One example of target being set to null :
hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
Lines 446 to 447 in efdec92
renameAndDeleteDiff.add(new DiffInfo(source, null,
SnapshotDiffReport.DiffType.DELETE));
May be there could be sanity check in delete diff for target, but not very
confident about that part, will explore sometime if there is any use case
possible where it can be not-null & compat stuff.
Further general optimisations as well are possible, like don't rename to tmp
and then delete, directly delete(There is a reason why it is like that), that
is something in my TODO list, will chase in future
General Info: Filters are like quite used in DR setups, some time we don't want
to copy some data to replica clusters. One example could be Trash data, many
other use cases as well.
It looks better at:
[https://github.com/apache/hadoop/pull/3940#issuecomment-1035009857]
> Distcp: Sync moves filtered file to home directory rather than deleting
> -----------------------------------------------------------------------
>
> Key: HADOOP-18096
> URL: https://issues.apache.org/jira/browse/HADOOP-18096
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Ayush Saxena
> Assignee: Ayush Saxena
> Priority: Critical
> Labels: pull-request-available
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> Distcp sync with snapshot, if the file being copied is renamed to a path
> which is in the exclusion filter, tries to delete the file.
> But instead of deleting, it moves the file to home directory
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]