[ 
https://issues.apache.org/jira/browse/HDFS-16145?focusedWorklogId=627818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-627818
 ]

ASF GitHub Bot logged work on HDFS-16145:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jul/21 15:29
            Start Date: 26/Jul/21 15:29
    Worklog Time Spent: 10m 
      Work Description: sodonnel commented on a change in pull request #3234:
URL: https://github.com/apache/hadoop/pull/3234#discussion_r676713165



##########
File path: 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
##########
@@ -563,10 +589,27 @@ private Path translateRenamedPath(Path sourcePath,
     } else {
       List<DiffInfo> renameDiffsList =
           diffMap.get(SnapshotDiffReport.DiffType.RENAME);
+      List<DiffInfo> deletedDirDiffsList =
+          diffMap.get(SnapshotDiffReport.DiffType.DELETE);

Review comment:
       Will this list hold all files and directories that have been deleted, 
eg: if I delete a directory with 1000 entries, will this end up with 1001 
entries?
   
   Then in `isParentOrSelfMarkedDeleted()` we need to scan this list. Could 
this list be very large and cause a performance problem when scanning it over 
and over for each entry in the diffList?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 627818)
    Time Spent: 1h  (was: 50m)

> CopyListing fails with FNF exception with snapshot diff
> -------------------------------------------------------
>
>                 Key: HDFS-16145
>                 URL: https://issues.apache.org/jira/browse/HDFS-16145
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Distcp with snapshotdiff and with filters, marks a Rename as a delete 
> opeartion on the target if the rename target is to a directory which is 
> exluded by the filter. But, in cases, where files/subdirs created/modified 
> prior to the Rename post the old snapshot will still be present as 
> modified/created entries in the final copy list. Since, the parent diretory 
> is marked for deletion, these subsequent create/modify entries should be 
> ignored while building the final copy list. 
> With such cases, when the final copy list is built, distcp tries to do a 
> lookup for each create/modified file in the newer snapshot which will fail 
> as, the parent dir is already moved to a new location in later snapshot.
>  
> {code:java}
> sudo -u kms hadoop key create testkey
> hadoop fs -mkdir -p /data/gcgdlknnasg/
> hdfs crypto -createZone -keyName testkey -path /data/gcgdlknnasg/
> hadoop fs -mkdir -p /dest/gcgdlknnasg
> hdfs crypto -createZone -keyName testkey -path /dest/gcgdlknnasg
> hdfs dfs -mkdir /data/gcgdlknnasg/dir1
> hdfs dfsadmin -allowSnapshot /data/gcgdlknnasg/ 
> hdfs dfsadmin -allowSnapshot /dest/gcgdlknnasg/ 
> [root@nightly62x-1 logs]# hdfs dfs -ls -R /data/gcgdlknnasg/
> drwxrwxrwt   - hdfs supergroup          0 2021-07-16 14:05 
> /data/gcgdlknnasg/.Trash
> drwxr-xr-x   - hdfs supergroup          0 2021-07-16 13:07 
> /data/gcgdlknnasg/dir1
> [root@nightly62x-1 logs]# hdfs dfs -ls -R /dest/gcgdlknnasg/
> [root@nightly62x-1 logs]#
> hdfs dfs -put /etc/hosts /data/gcgdlknnasg/dir1/
> hdfs dfs -rm -r /data/gcgdlknnasg/dir1/
> hdfs dfs -mkdir /data/gcgdlknnasg/dir1/
> ===> Run BDR with “Abort on Snapshot Diff Failures” CHECKED now in the 
> replication schedule. You get into below error and failure of the BDR job.
> 21/07/16 15:02:30 INFO distcp.DistCp: Failed to use snapshot diff - 
> java.io.FileNotFoundException: File does not exist: 
> /data/gcgdlknnasg/.snapshot/distcp-5-46485360-new/dir1/hosts
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1494)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1487)
> ……..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to