Sadanand Shenoy created HDFS-17120:
--------------------------------------

             Summary: Support snapshot diff based copylisting for flat paths.
                 Key: HDFS-17120
                 URL: https://issues.apache.org/jira/browse/HDFS-17120
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Sadanand Shenoy
            Assignee: Sadanand Shenoy


Currently for Diff-based copyListing that is used during the distcpSync step of 
an incremental copy by default the SimpleCopyListing implementation is used.  
In it's implementation it iterates through the DiffReport and if the DiffType 
is Create and the path is a directory, it recursively traverses the directory 
and adds the subpaths to the resultant copyList.

This works fine for implementations of snapshotDiff that include only top-level 
directories as part of its DiffReport . Suppose a snapshotDiff implementation 
outputs only flat paths that include both the directory and sub-directory 
subpath in its DiffReport, it will lead to duplicate paths in the copyList and 
throws DuplicateFileException.

 

For example

Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all subpaths 
as part of the diff.
{code:java}
[~]# ozone sh snapshot create vol11/buck1 snap1
[~]# ozone sh snapshot create vol11/buck2 snap1
[~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
[ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
[ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
[~]# ozone sh snapshot create vol11/buck1 snap2 
[~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
Difference between snapshot: snap1 and snapshot: snap2
+    ./dir1
+    ./dir1/dir11
+    ./dir1/dir11/dir111  {code}
we can see even though dir11 & dir111 are subpaths they are present in snapdiff 
, This is not the case for HDFS though.

This Jira aims to create a new copyListing impl that is used for diff based 
copyListing that doesn't traverse the directory but only adds paths that are 
present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to