[jira] Commented: (MAPREDUCE-2117) Superfast Distcp when copying data within the same hdfs cluster

dhruba borthakur (JIRA) Fri, 08 Oct 2010 13:15:00 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919371#action_12919371
 ]


dhruba borthakur commented on MAPREDUCE-2117:
---------------------------------------------

Doug, I agree. This is more like a fully materialized snapshot rather than a 
true copy-on-write snapshot. If the data in each region is small and is 
scattered among a relatively large set of machines, the fully materialized 
approach works ok, otherwise the more performant copy-on-write snapshot would 
be needed.

> Superfast Distcp when copying data within the same hdfs cluster
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-2117
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2117
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>            Reporter: dhruba borthakur
>
> There are use cases when distcp is used to copy a bunch of files/directories 
> from one part of the HDFS namespace to another part within the same HDFS 
> cluster. It is superfast if we can instruct relevant datanodes to make local 
> replicas of relevant blocks and limit network usage to a minimum. It is 
> especially useful to make HBase take a backup of a region with minimum 
> downtime. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2117) Superfast Distcp when copying data within the same hdfs cluster

Reply via email to