[ https://issues.apache.org/jira/browse/HDFS-16145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shashikant Banerjee updated HDFS-16145: --------------------------------------- Description: Distcp with snapshotdiff and with filters, marks a Rename as a delete opeartion on the target if the rename target is to a directory which is exluded by the filter. But, in cases, where files/subdirs created/modified prior to the Rename post the old snapshot will still be present as modified/created entries in the final copy list. Since, the parent diretory is marked for deletion, these subsequent create/modify entries should be ignored while building the final copy list. With such cases, when the final copy list is built, distcp tries to do a lookup for each create/modified file in the newer snapshot which will fail as, the parent dir is already moved to a new location in later snapshot. {code:java} sudo -u kms hadoop key create testkey hadoop fs -mkdir -p /data/gcgdlknnasg/ hdfs crypto -createZone -keyName testkey -path /data/gcgdlknnasg/ hadoop fs -mkdir -p /dest/gcgdlknnasg hdfs crypto -createZone -keyName testkey -path /dest/gcgdlknnasg hdfs dfs -mkdir /data/gcgdlknnasg/dir1 hdfs dfsadmin -allowSnapshot /data/gcgdlknnasg/ hdfs dfsadmin -allowSnapshot /dest/gcgdlknnasg/ [root@nightly62x-1 logs]# hdfs dfs -ls -R /data/gcgdlknnasg/ drwxrwxrwt - hdfs supergroup 0 2021-07-16 14:05 /data/gcgdlknnasg/.Trash drwxr-xr-x - hdfs supergroup 0 2021-07-16 13:07 /data/gcgdlknnasg/dir1 [root@nightly62x-1 logs]# hdfs dfs -ls -R /dest/gcgdlknnasg/ [root@nightly62x-1 logs]# hdfs dfs -put /etc/hosts /data/gcgdlknnasg/dir1/ hdfs dfs -rm -r /data/gcgdlknnasg/dir1/ hdfs dfs -mkdir /data/gcgdlknnasg/dir1/ ===> Run BDR with “Abort on Snapshot Diff Failures” CHECKED now in the replication schedule. You get into below error and failure of the BDR job. 21/07/16 15:02:30 INFO distcp.DistCp: Failed to use snapshot diff - java.io.FileNotFoundException: File does not exist: /data/gcgdlknnasg/.snapshot/distcp-5-46485360-new/dir1/hosts at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1494) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1487) …….. {code} was: Distcp with snapshotdiff and with filters, marks a Rename as a delete opeartion on the target if the rename target is to a directory which is exluded by the filter. But, in cases, where files/subdirs created/modified prior to the Rename post the old snapshot will still be present as modified/created entries in the final copy list. Since, the parent diretory is marked for deletion, these subsequent create/modify entries should be ignored while building the final copy list. With such cases, when the final copy list is built, distcp tries to do a lookup for each create/modified file in the l\newer snapshot which will fail as, the parent dir is already moved to a new location in later snapshot. {code:java} sudo -u kms hadoop key create testkey hadoop fs -mkdir -p /data/gcgdlknnasg/ hdfs crypto -createZone -keyName testkey -path /data/gcgdlknnasg/ hadoop fs -mkdir -p /dest/gcgdlknnasg hdfs crypto -createZone -keyName testkey -path /dest/gcgdlknnasg hdfs dfs -mkdir /data/gcgdlknnasg/dir1 hdfs dfsadmin -allowSnapshot /data/gcgdlknnasg/ hdfs dfsadmin -allowSnapshot /dest/gcgdlknnasg/ [root@nightly62x-1 logs]# hdfs dfs -ls -R /data/gcgdlknnasg/ drwxrwxrwt - hdfs supergroup 0 2021-07-16 14:05 /data/gcgdlknnasg/.Trash drwxr-xr-x - hdfs supergroup 0 2021-07-16 13:07 /data/gcgdlknnasg/dir1 [root@nightly62x-1 logs]# hdfs dfs -ls -R /dest/gcgdlknnasg/ [root@nightly62x-1 logs]# hdfs dfs -put /etc/hosts /data/gcgdlknnasg/dir1/ hdfs dfs -rm -r /data/gcgdlknnasg/dir1/ hdfs dfs -mkdir /data/gcgdlknnasg/dir1/ ===> Run BDR with “Abort on Snapshot Diff Failures” CHECKED now in the replication schedule. You get into below error and failure of the BDR job. 21/07/16 15:02:30 INFO distcp.DistCp: Failed to use snapshot diff - java.io.FileNotFoundException: File does not exist: /data/gcgdlknnasg/.snapshot/distcp-5-46485360-new/dir1/hosts at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1494) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1487) …….. {code} > CopyListing fails with FNF exception with snapshot diff > ------------------------------------------------------- > > Key: HDFS-16145 > URL: https://issues.apache.org/jira/browse/HDFS-16145 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp > Reporter: Shashikant Banerjee > Assignee: Shashikant Banerjee > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Distcp with snapshotdiff and with filters, marks a Rename as a delete > opeartion on the target if the rename target is to a directory which is > exluded by the filter. But, in cases, where files/subdirs created/modified > prior to the Rename post the old snapshot will still be present as > modified/created entries in the final copy list. Since, the parent diretory > is marked for deletion, these subsequent create/modify entries should be > ignored while building the final copy list. > With such cases, when the final copy list is built, distcp tries to do a > lookup for each create/modified file in the newer snapshot which will fail > as, the parent dir is already moved to a new location in later snapshot. > > {code:java} > sudo -u kms hadoop key create testkey > hadoop fs -mkdir -p /data/gcgdlknnasg/ > hdfs crypto -createZone -keyName testkey -path /data/gcgdlknnasg/ > hadoop fs -mkdir -p /dest/gcgdlknnasg > hdfs crypto -createZone -keyName testkey -path /dest/gcgdlknnasg > hdfs dfs -mkdir /data/gcgdlknnasg/dir1 > hdfs dfsadmin -allowSnapshot /data/gcgdlknnasg/ > hdfs dfsadmin -allowSnapshot /dest/gcgdlknnasg/ > [root@nightly62x-1 logs]# hdfs dfs -ls -R /data/gcgdlknnasg/ > drwxrwxrwt - hdfs supergroup 0 2021-07-16 14:05 > /data/gcgdlknnasg/.Trash > drwxr-xr-x - hdfs supergroup 0 2021-07-16 13:07 > /data/gcgdlknnasg/dir1 > [root@nightly62x-1 logs]# hdfs dfs -ls -R /dest/gcgdlknnasg/ > [root@nightly62x-1 logs]# > hdfs dfs -put /etc/hosts /data/gcgdlknnasg/dir1/ > hdfs dfs -rm -r /data/gcgdlknnasg/dir1/ > hdfs dfs -mkdir /data/gcgdlknnasg/dir1/ > ===> Run BDR with “Abort on Snapshot Diff Failures” CHECKED now in the > replication schedule. You get into below error and failure of the BDR job. > 21/07/16 15:02:30 INFO distcp.DistCp: Failed to use snapshot diff - > java.io.FileNotFoundException: File does not exist: > /data/gcgdlknnasg/.snapshot/distcp-5-46485360-new/dir1/hosts > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1494) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1487) > …….. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org