[ 
https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367960#comment-17367960
 ] 

Wei-Chiu Chuang commented on HDFS-13916:
----------------------------------------

Looks fine to me.

I'm still hold the same opinion that we should ultimately support the 
getSnapshotDiffReportListing API instead. As an example: 
https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2391-L2413

When there are millions of diffs between two snapshots, the old 
getSnapshotDiffReport() isn't scalable. NameNode find itself creating huge RPC 
messages for the snapshot diff items, which creates GC memory pressure; 
application produces big memory spikes too.

We don't have the getSnapshotDiffReportListing API support in webhdfs now 
though.

> Distcp SnapshotDiff to support WebHDFS
> --------------------------------------
>
>                 Key: HDFS-13916
>                 URL: https://issues.apache.org/jira/browse/HDFS-13916
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: distcp, webhdfs
>    Affects Versions: 3.0.1, 3.1.1
>            Reporter: Xun REN
>            Assignee: Xun REN
>            Priority: Major
>              Labels: easyfix, newbie, patch
>         Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, 
> HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, 
> HDFS-13916.007.patch, HDFS-13916.patch
>
>
> [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to 
> make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, 
> there is no modification for the real java class which is used by launching 
> the command "hadoop distcp ..."
>  
> You can check in the latest version here:
> [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100]
> In the method "preSyncCheck" of the class "DistCpSync", we still check if the 
> file system is DFS. 
> So I propose to change the class DistCpSync in order to take into 
> consideration what was committed by Lokesh Jain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to