[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747560#comment-17747560 ]
ASF GitHub Bot commented on HDFS-17120: --------------------------------------- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1275096942 ########## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ########## @@ -167,6 +169,77 @@ public void testDuplicates() { } } + @Test(expected = DuplicateFileException.class, timeout = 10000) + public void testDiffBasedSimpleCopyListing() throws IOException { + FileSystem fs = null; + Configuration configuration = getConf(); + DistCpSync distCpSync = Mockito.mock(DistCpSync.class); + Path listingFile = new Path("/tmp/list"); + // Throws DuplicateFileException as it recursively traverses src3 directory + // and also adds 3.txt,4.txt twice + configuration.setBoolean( + DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true); + try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); + } catch (IOException e) { + LOG.error("Exception encountered in test", e); Review Comment: done. ########## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ########## @@ -167,6 +169,77 @@ public void testDuplicates() { } } + @Test(expected = DuplicateFileException.class, timeout = 10000) + public void testDiffBasedSimpleCopyListing() throws IOException { + FileSystem fs = null; + Configuration configuration = getConf(); + DistCpSync distCpSync = Mockito.mock(DistCpSync.class); + Path listingFile = new Path("/tmp/list"); + // Throws DuplicateFileException as it recursively traverses src3 directory + // and also adds 3.txt,4.txt twice + configuration.setBoolean( + DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true); + try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); + } catch (IOException e) { + LOG.error("Exception encountered in test", e); + Assert.fail("Test failed " + e.getMessage()); + } finally { + TestDistCpUtils.delete(fs, "/tmp"); + } + } + + @Test(timeout=10000) + public void testDiffBasedSimpleCopyListingWithoutTraverseDirectory() { + FileSystem fs = null; + Configuration configuration = getConf(); + DistCpSync distCpSync = Mockito.mock(DistCpSync.class); + Path listingFile = new Path("/tmp/list"); + // no exception expected in this case + configuration.setBoolean( + DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, false); + try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); + } catch (IOException e) { + LOG.error("Exception encountered in test", e); Review Comment: done. > Support snapshot diff based copylisting for flat paths. > ------------------------------------------------------- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Sadanand Shenoy > Assignee: Sadanand Shenoy > Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org