[ https://issues.apache.org/jira/browse/HADOOP-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mithun Radhakrishnan reopened HADOOP-8143: ------------------------------------------ Chaps, would it be ok if we revisited ask? 1. [~davet]'s original problem remains, i.e. copying files between 2 clusters with different default block-sizes will fail, without either -pb or -skipCrc. HADOOP-8233 only solves this for 0-byte files. 2. File-formats such as ORC perform several optimizations w.r.t. data-stripes and HDFS-block-sizes. If such files were to be copied between clusters without preserving block-sizes, there would ensue performance-fails (at best) or data-corruptions (at worst). Would it be acceptable to preserve block-sizes by default (i.e. if -p isn't used), only if the source and target file-systems are HDFS? > Change distcp to have -pb on by default > --------------------------------------- > > Key: HADOOP-8143 > URL: https://issues.apache.org/jira/browse/HADOOP-8143 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Dave Thompson > Assignee: Dave Thompson > Priority: Minor > > We should have the preserve blocksize (-pb) on in distcp by default. > checksum which is on by default will always fail if blocksize is not the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)