[ https://issues.apache.org/jira/browse/MAPREDUCE-7287?focusedWorklogId=576482&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-576482 ]
ASF GitHub Bot logged work on MAPREDUCE-7287: --------------------------------------------- Author: ASF GitHub Bot Created on: 03/Apr/21 20:07 Start Date: 03/Apr/21 20:07 Worklog Time Spent: 10m Work Description: ayushtkn commented on pull request #2852: URL: https://github.com/apache/hadoop/pull/2852#issuecomment-812918934 Apart from the checkstyle issue, changes LGTM. Removal of code shouldn't bother, it is used in ``validatePaths`` which won't break by passing the actual dest. In the present code we were passing ``/NONE`` as target which actually didn't exist considering we would be least bothered with the target path, when building listing, but the problem seems to happen at ```SimpleCopyListing#computeSourceRootPath``` when the source is a ``solitary file``, It has check which relies on target: ``` if (solitaryFile) { if (!targetPathExists || targetFS.isFile(target)) { return sourceStatus.getPath(); } else { return sourceStatus.getPath().getParent(); } ``` So, here the ``targetFS.isFile(target)`` returns `false` due to ``FNF`` on ``/NONE`` which wasn't in the case when building listing for ``source``, so for source the listing is built with ``sourceStatus.getPath();`` and for target with ``sourceStatus.getPath().getParent()``, hence on comparison in ``CopyCommitter`` file gets deleted. So, this passing `/NONE` wasn't a good idea, The present fix to pass the actual destination seems fair enough. Just give a check, guess these constants can be removed: These constants can be removed now? ``` /** * Constants for NONE file deletion */ public static final String NONE_PATH_NAME = "/NONE"; public static final Path NONE_PATH = new Path(NONE_PATH_NAME); public static final Path RAW_NONE_PATH = new Path( DistCpConstants.HDFS_RESERVED_RAW_DIRECTORY_NAME + NONE_PATH_NAME); ``` Nice Catch!!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 576482) Time Spent: 40m (was: 0.5h) > Distcp will delete exists file , If we use "--delete and --update" options > and distcp file. > -------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-7287 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7287 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp > Affects Versions: 3.2.1 > Reporter: zhengchenyu > Priority: Major > Labels: pull-request-available > Fix For: 3.3.1 > > Attachments: MAPREDUCE-7287.001.patch > > Time Spent: 40m > Remaining Estimate: 0h > > hdfs://ns1/tmp/a is an existing file, hdfs://ns2/tmp/a is also an existing > file. > When I run this command, > {code:java} > hadoop distcp -delete -update hdfs://ns1/tmp/a hdfs://ns2/tmp/a > {code} > I Found hdfs://ns2/tmp/a is deleted unpectectedly. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org