[jira] [Commented] (HADOOP-8239) Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used

Kihwal Lee (JIRA) Sat, 18 Aug 2012 01:29:45 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437271#comment-13437271
 ]


Kihwal Lee commented on HADOOP-8239:
------------------------------------

I think XML is fine. XML parsing is done at the document level, so we can 
safely find out or ignore the existence of the extra parameter and not worry 
about the size of data. I tried calling getFileChecksum() over Hftp between a 
patched 0.23 cluster and a 1.0.x cluster, and it worked fine both ways.

The change you suggested does not solve the whole problem. The magic number is 
like a simple binary length field. Presence/absence of it tells you how much 
data you need to read. So the read-side of patched version works even when 
reading from an unpatched version.  But it's not true for the other way around. 
The unpatched version will always leave something unread in the stream. XML is 
nice in that it inherently has begin and end marker and not sensitive to size 
changes. 

Since JsonUtil depends on this serialization/deserialization methods I don't 
think it cannot obtain the bidirectional compatibility by modifying only one 
side. If it had used XML and did not do the length check, it would have no such 
problem. Fully Json-ized approach could have worked as well. 

One approach I can think of is to leave the current readFields()/write() 
methods unchanged. I think only WebHdfs is using it and if that is true, we can 
make WebHdfs actually send and receive everything in JSON format and keep the 
current "bytes" Json field as is. When it does not find the "new" fields from 
an old data source, it can do the old deserialization on "bytes". Similarly, it 
should send everything in individual JSON field as well as the old serialzed 
"bytes". 

It may be better to move the JSON util methods to MD5MD5CRC32FileChecksum.java, 
since they will have to know the internals of MD5MD5CRC32FileChecksum.


                
> Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-8239
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8239
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 2.1.0-alpha
>
>         Attachments: hadoop-8239-after-hadoop-8240.patch.txt, 
> hadoop-8239-before-hadoop-8240.patch.txt
>
>
> In order to support HADOOP-8060, MD5MD5CRC32FileChecksum needs to be extended 
> to carry the information on the actual checksum type being used. The 
> interoperability between the extended version and branch-1 should be 
> guaranteed when Filesystem.getFileChecksum() is called over hftp, webhdfs or 
> httpfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8239) Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used

Reply via email to