[jira] [Commented] (HADOOP-8239) Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used

Tsz Wo (Nicholas), SZE (JIRA) Mon, 20 Aug 2012 01:22:50 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437733#comment-13437733
 ]


Tsz Wo (Nicholas), SZE commented on HADOOP-8239:
------------------------------------------------

> One approach I can think of is to leave the current readFields()/write() 
> methods unchanged. I think only WebHdfs is using it and if that is true, we 
> can make WebHdfs actually send and receive everything in JSON format and keep 
> the current "bytes" Json field as is.

FileChecksum is designed to support different kinds of checksum algorithms so 
that it has the following abstract methods
{code}
public abstract String getAlgorithmName();
public abstract int getLength();
public abstract byte[] getBytes();
{code}
[WebHDFS FileChecksum 
schema|http://hadoop.apache.org/common/docs/r1.0.0/webhdfs.html#FileChecksum] 
has fields corresponding to these methods.

With FileChecksum, clients like WebHDFS could obtain the corresponding checksum 
by first getting the checksum algorithm name and then computing the bytes.  If 
we add MD5MD5CRC32FileChecksum specific fields to the JSON format, then it is 
harder to support other algorithms and harder to specify the WebHDFS API since 
we have to specify the cases for each algorithm in the API.

For our tasks here, we are actually adding new algorithms as we have to change 
the algorithm name for different CRC types.  So, we may as well add new classes 
to handle them instead of changing MD5MD5CRC32FileChecksum.   BTW, the name 
"MD5MD5CRC32FileChecksum" is not suitable for the other crc type because it has 
"CRC32".  Thought?
                
> Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-8239
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8239
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 2.1.0-alpha
>
>         Attachments: hadoop-8239-after-hadoop-8240.patch.txt, 
> hadoop-8239-after-hadoop-8240.patch.txt, 
> hadoop-8239-before-hadoop-8240.patch.txt, 
> hadoop-8239-before-hadoop-8240.patch.txt
>
>
> In order to support HADOOP-8060, MD5MD5CRC32FileChecksum needs to be extended 
> to carry the information on the actual checksum type being used. The 
> interoperability between the extended version and branch-1 should be 
> guaranteed when Filesystem.getFileChecksum() is called over hftp, webhdfs or 
> httpfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8239) Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used

Reply via email to