[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746608#comment-17746608
 ] 

ASF GitHub Bot commented on HDFS-17120:
---------------------------------------

sadanand48 opened a new pull request, #5885:
URL: https://github.com/apache/hadoop/pull/5885

   ### Description of PR
   Currently for Diff-based copyListing that is used during the distcpSync step 
of an incremental copy by default the SimpleCopyListing implementation is used. 
 In it's implementation it iterates through the DiffReport and if the DiffType 
is Create and the path is a directory, it recursively traverses the directory 
and adds the subpaths to the resultant copyList.
   
   This PR adds a copyListing implementation which only considers flat paths in 
snapshotDiff report & doesn't traverse directories recursively.
   There is no impact to existing behaviour as the default copyListing impl for 
diff based copy is SimpleCopyListing but can be overridden if desired using a 
config.
   
   https://issues.apache.org/jira/browse/HDFS-17120
   
   ### How was this patch tested?
   ### For code changes:
   Added Unit tests
   
   




> Support snapshot diff based copylisting for flat paths.
> -------------------------------------------------------
>
>                 Key: HDFS-17120
>                 URL: https://issues.apache.org/jira/browse/HDFS-17120
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Sadanand Shenoy
>            Assignee: Sadanand Shenoy
>            Priority: Major
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to