[ 
https://issues.apache.org/jira/browse/HADOOP-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reopened HADOOP-8143:
------------------------------------------

Chaps, would it be ok if we revisited ask?

1. [~davet]'s original problem remains, i.e. copying files between 2 clusters 
with different default block-sizes will fail, without either -pb or -skipCrc. 
HADOOP-8233 only solves this for 0-byte files.

2. File-formats such as ORC perform several optimizations w.r.t. data-stripes 
and HDFS-block-sizes. If such files were to be copied between clusters without 
preserving block-sizes, there would ensue performance-fails (at best) or 
data-corruptions (at worst).

Would it be acceptable to preserve block-sizes by default (i.e. if -p isn't 
used), only if the source and target file-systems are HDFS?

> Change distcp to have -pb on by default
> ---------------------------------------
>
>                 Key: HADOOP-8143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8143
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Dave Thompson
>            Assignee: Dave Thompson
>            Priority: Minor
>
> We should have the preserve blocksize (-pb) on in distcp by default.        
> checksum which is on by default will always fail if blocksize is not the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to