[
https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982590#comment-15982590
]
Benjamin Huo edited comment on HDFS-7535 at 4/25/17 9:01 AM:
-------------------------------------------------------------
I've one question regarding the following comments:
"This snapshot diff report represents the delta that should be applied to the
backup cluster. For changes like deletion and rename we can directly apply the
same operations (following some specific order based on their dependency) in
the backup cluster. For changes like creation, append, and other metadata
modification we keep using the functionality of the current distcp."
I'm not very clear about what "we keep using the functionality of the current
distcp" means.
After fix HDFS-7535, the file changes list for creation and modification are
generated based on snapshots s1 and s2 on the source cluster, or it's generated
based on the file changes between source cluster and destination cluster(with
extra cost to transfer file list between source and target cluster )?
Thanks
Ben
was (Author: benjaminh):
I've one question regarding the following comments:
"This snapshot diff report represents the delta that should be applied to the
backup cluster. For changes like deletion and rename we can directly apply the
same operations (following some specific order based on their dependency) in
the backup cluster. For changes like creation, append, and other metadata
modification we keep using the functionality of the current distcp."
I'm not very clear about what "we keep using the functionality of the current
distcp" means.
After fix HDFS-7535, the file changes list for creation and modification are
generated based on snapshots s1 and s2 on the source cluster, or it's generated
based on the file changes between source cluster and destination cluster?
Thanks
Ben
> Utilize Snapshot diff report for distcp
> ---------------------------------------
>
> Key: HDFS-7535
> URL: https://issues.apache.org/jira/browse/HDFS-7535
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: distcp, snapshots
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Fix For: 2.7.0
>
> Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch,
> HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch
>
>
> Currently HDFS snapshot diff report can identify file/directory creation,
> deletion, rename and modification under a snapshottable directory. We can use
> the diff report for distcp between the primary cluster and a backup cluster
> to avoid unnecessary data copy. This is especially useful when there is a big
> directory rename happening in the primary cluster: the current distcp cannot
> detect the rename op thus this rename usually leads to large amounts of real
> data copy.
> More details of the approach will come in the first comment.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]