[
https://issues.apache.org/jira/browse/HADOOP-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437980#comment-13437980
]
Kihwal Lee commented on HADOOP-8239:
------------------------------------
I think adding a new class is a good idea. Since DFS.getFileChcksum is expected
to return MD5MD5CRC32FileChecksum in a lot of places, subclassing
MD5MD5CRC32FileChecksum for each variant could work.
We can regard "CRC32" in MD5MD5CRC32FileChecksum as a generic term for any 32
bit CRC algorithms. At least that is the case in current 2.0/trunk. If we go
with this, subclassing MD5MD5CRC32FileChecksum for each variant makes sense.
The following is what I am thinking:
*In MD5MD5CRC32FileChecksum*
The constructor sets crcType to DataChecksum.Type.CRC32
{code}
/**
* getAlgorithmName() will use it to construct the name
*/
private DataChecksum.Type getCrcType() {
return crcType;
}
public ChecksumOpt getChecksumOpt() {
rethrn new ChecksumOpt(getCrcType(), bytesPerCrc);
}
{code}
*Subclass MD5MD5CRC32GzipFileChecksum*
The constructor sets crcType to DataChecksum.Type.CRC32
*Subclass MD5MD5CRC32CastagnoliFileChecksum*
The constructor sets crcType to DataChecksum.Type.CRC32C
*Interoperability & compatibility*
- Any existing user/hadoop code that expects MD5MD5CRC32FileChecksum from
DFS.getFileChecksum() will continue to work.
- Any new code that makes use of the new getChecksumOpt() will work as long as
DFSClient#getFileChecksum() creates and returns the right object. This will be
done in HDFS-3177, and without it, every thing will default to CRC32, which is
the current behavior of branch-2/trunk.
- A newer client calling getFileChecksum() to an old cluster over hftp or
webhdfs will work. (always CRC32)
- An older client calling getFileChecksum() to newer cluster - If the remote
file on the newer cluster is in CRC32, both hftp and webhdfs work. If CRC32C
or anything else, hftp will have a cheksum mismatch. In webhdfs, it will get an
algorithm field that won't match anything the old MD5MD5CRC32FileChecksum can
create. In WebHdfsFileSystem, it will generate an IOException, "Algorithm not
matched:....".
I think this is reasonable. What do you think?
> Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used
> --------------------------------------------------------------------------
>
> Key: HADOOP-8239
> URL: https://issues.apache.org/jira/browse/HADOOP-8239
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Fix For: 2.1.0-alpha
>
> Attachments: hadoop-8239-after-hadoop-8240.patch.txt,
> hadoop-8239-after-hadoop-8240.patch.txt,
> hadoop-8239-before-hadoop-8240.patch.txt,
> hadoop-8239-before-hadoop-8240.patch.txt
>
>
> In order to support HADOOP-8060, MD5MD5CRC32FileChecksum needs to be extended
> to carry the information on the actual checksum type being used. The
> interoperability between the extended version and branch-1 should be
> guaranteed when Filesystem.getFileChecksum() is called over hftp, webhdfs or
> httpfs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira