[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627181#comment-16627181
 ] 

Steve Loughran commented on HDFS-3889:
--------------------------------------

This is a real problem. but it's too late to fix because too many workflows use 
distcp without the -skipCrcCheck option against stores which don't do 
checksums, to the extent that if you add checksums to an FS, people's backups 
break (HADOOP-15297).

If this were to be done, it'd have to be through some new checksum option, 
something like -checksums "skip", "enable", "strict", "ignore-type-mismatch', 
'metadata' etc.

the strict one would be the strictest checks possible; 'metadata' the metadata, 
though there I think it'd be hard pressed to work reliably.


> distcp overwrites files even when there are missing checksums
> -------------------------------------------------------------
>
>                 Key: HDFS-3889
>                 URL: https://issues.apache.org/jira/browse/HDFS-3889
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 2.0.2-alpha
>            Reporter: Colin P. McCabe
>            Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
>     try {
>       sourceChecksum = sourceFS.getFileChecksum(source);
>       targetChecksum = targetFS.getFileChecksum(target);
>     } catch (IOException e) {
>       LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to