[
https://issues.apache.org/jira/browse/HADOOP-16536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917873#comment-16917873
]
Kai Xie commented on HADOOP-16536:
----------------------------------
Backport HADOOP-15273 as well, because HADOOP-16158 depends on it to be able to
skip checksum validation between different file systems (say HDFS and S3).
Error message hint for checksum combine mode is also removed, because this
feature is not ported to branch-2.
> Backport HADOOP-16158 and HADOOP-15273 to branch-2
> --------------------------------------------------
>
> Key: HADOOP-16536
> URL: https://issues.apache.org/jira/browse/HADOOP-16536
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 2.9.2
> Reporter: Kai Xie
> Assignee: Kai Xie
> Priority: Major
>
> Copying blocks in parallel (enabled when blocks per chunk > 0) is a great
> DistCp improvement that can hugely speed up copying big files.
> But its checksum validation is skipped, e.g. in
> `RetriableFileCopyCommand.java`
>
> {code:java}
> if (!source.isSplit()) {
> compareCheckSums(sourceFS, source.getPath(), sourceChecksum,
> targetFS, targetPath);
> }
> {code}
> and this could result in checksum/data mismatch without notifying
> developers/users (e.g. HADOOP-16049).
> I'd like to provide a patch to add the checksum validation.
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]