[
https://issues.apache.org/jira/browse/MAPREDUCE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919371#action_12919371
]
dhruba borthakur commented on MAPREDUCE-2117:
---------------------------------------------
Doug, I agree. This is more like a fully materialized snapshot rather than a
true copy-on-write snapshot. If the data in each region is small and is
scattered among a relatively large set of machines, the fully materialized
approach works ok, otherwise the more performant copy-on-write snapshot would
be needed.
> Superfast Distcp when copying data within the same hdfs cluster
> ---------------------------------------------------------------
>
> Key: MAPREDUCE-2117
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2117
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: distcp
> Reporter: dhruba borthakur
>
> There are use cases when distcp is used to copy a bunch of files/directories
> from one part of the HDFS namespace to another part within the same HDFS
> cluster. It is superfast if we can instruct relevant datanodes to make local
> replicas of relevant blocks and limit network usage to a minimum. It is
> especially useful to make HBase take a backup of a region with minimum
> downtime.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.