[ 
https://issues.apache.org/jira/browse/HADOOP-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HADOOP-3981:
-------------------------------------------

    Release Note: 
Implemented MD5-of-xxxMD5-of-yyyCRC32 which is a distributed file checksum 
algorithm for HDFS, where xxx is the number of CRCs per block and yyy is the 
number of bytes per CRC.

Changed DistCp to use file checksum for comparing files if both source and 
destination FileSystem(s) support getFileChecksum(...).

> Need a distributed file checksum algorithm for HDFS
> ---------------------------------------------------
>
>                 Key: HADOOP-3981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3981
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3981_20080909.patch, 3981_20080910.patch, 
> 3981_20080910b.patch
>
>
> Traditional message digest algorithms, like MD5, SHA1, etc., require reading 
> the entire input message sequentially in a central location.  HDFS supports 
> large files with multiple tera bytes.  The overhead of reading the entire 
> file is huge. A distributed file checksum algorithm is needed for HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to