[ 
https://issues.apache.org/jira/browse/HADOOP-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073870#comment-17073870
 ] 

Steve Loughran commented on HADOOP-8143:
----------------------------------------

This turns out to have caused a couple of regressions when working with object 
storage.

# HADOOP-16932: S3A 404 caching can break copy, because HADOOP-13145 only skips 
the probe if the attribute set is empty (WiP fix: strip out blocksize, 
replication and checksum options)
# HADOOP-16756 : incremental backups to to s3a, abfs or any other store where 
the blocksize is just come client-side config options *will now always back up 
every single file*. Always. 

Issue number one is straightforward to fix and I am doing so as we speak. Issue 
number two is different, *because there is now no way to say "I don't want 
blocksize preserved"

* if we do not want checksum validation (-skipCrcCheck), we don't need to 
preserve block size.
* even if we do want checksums, if HDFS-13056 is enabled, checksums are now 
independent of block size

So what to do here? I don't think we need this and for cloud storage it is a 
major regression.

At the very least, we need a way to turn this new default off. Especially when 
-skipCrcCheck is true.



> Change distcp to have -pb on by default
> ---------------------------------------
>
>                 Key: HADOOP-8143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8143
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Dave Thompson
>            Assignee: Mithun Radhakrishnan
>            Priority: Minor
>             Fix For: 3.0.0-alpha4
>
>         Attachments: HADOOP-8143.1.patch, HADOOP-8143.2.patch, 
> HADOOP-8143.3.patch
>
>
> We should have the preserve blocksize (-pb) on in distcp by default.        
> checksum which is on by default will always fail if blocksize is not the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to