[
https://issues.apache.org/jira/browse/HADOOP-18096?focusedWorklogId=724549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-724549
]
ASF GitHub Bot logged work on HADOOP-18096:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 10/Feb/22 14:49
Start Date: 10/Feb/22 14:49
Worklog Time Spent: 10m
Work Description: ayushtkn commented on pull request #3940:
URL: https://github.com/apache/hadoop/pull/3940#issuecomment-1035009857
@saintstack
The path is actually relative for rename entries it is made absolute here:
https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L471-L474
For normal delete there won't be any target, it would be always ``null``, so
it is added just like that in normal cases.
https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L465-L468
In this particular case when using filters.
The actual entry is a ``RENAME`` entry which has target. Rename has to have
a target. So, it takes this else block
https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L254
And when converting it to a ``DELETE`` entry, it even adds the target.
https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L262-L265
But since it is a delete entry the path isn't made absolute wrt target. So
it stays like a relative path. like `filterDir1` and since it doesn't start
with / and the normal logic by default it gets resolved to home directory.
Then the code that you shared does the magic, it moves it...
One example of target being set to ``null`` :
https://github.com/apache/hadoop/blob/efdec92cab8a88deb5ec9e81f5c8feb7a0fa873b/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L446-L447
May be there could be sanity check in delete diff for target, but not very
confident about that part, will explore sometime if there is any use case
possible where it can be not-null & compat stuff.
Further general optimisations as well are possible, like don't rename to tmp
and then delete, directly delete(There is a reason why it is like that), that
is something in my TODO list, will chase in future
General Info: Filters are like quite used in DR setups, some time we don't
want to copy some data to replica clusters. One example could be Trash data,
many other use cases as well.
Lemme know if it isn't still convincing..
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 724549)
Time Spent: 1h 50m (was: 1h 40m)
> Distcp: Sync moves filtered file to home directory rather than deleting
> -----------------------------------------------------------------------
>
> Key: HADOOP-18096
> URL: https://issues.apache.org/jira/browse/HADOOP-18096
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Ayush Saxena
> Assignee: Ayush Saxena
> Priority: Critical
> Labels: pull-request-available
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Distcp sync with snapshot, if the file being copied is renamed to a path
> which is in the exclusion filter, tries to delete the file.
> But instead of deleting, it moves the file to home directory
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]