[ 
https://issues.apache.org/jira/browse/HADOOP-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626614#action_12626614
 ] 

Doug Cutting commented on HADOOP-3941:
--------------------------------------

I don't see the point in passing the checksum algorithm name to 
getFileChecksum().  Do we expect a FileSystem to actually checksum a file on 
demand?  I assume not, that this feature is primarily for accessing 
pre-computed checksums, and that most filesystems will only support a single 
checksum algorithm.

There are two primary cases to consider:
  1. Copying files between filesystems that have pre-computed checksums using 
the same algorithm.
  2. Copying files between filesystems which either do not have pre-computed 
checksums or use different algorithms.

In (2) copies should use flie lengths or perhaps fail, and in (1) we should use 
checksums.  Right?

In any case, hardwiring distcp to use FileLengthChecksum doesn't seem like an 
improvement.

> Extend FileSystem API to return file-checksums/file-digests
> -----------------------------------------------------------
>
>                 Key: HADOOP-3941
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3941
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3941_20080818.patch, 3941_20080819.patch, 
> 3941_20080819b.patch, 3941_20080820.patch, 3941_20080826.patch, 
> 3941_20080827.patch
>
>
> Suppose we have two files in two locations (may be two clusters) and these 
> two files have the same size.  How could we tell whether the content of them 
> are the same?
> Currently, the only way is to read both files and compare the content of 
> them.  This is a very expensive operation if the files are huge.
> So, we would like to extend the FileSystem API to support returning 
> file-checksums/file-digests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to