[ 
https://issues.apache.org/jira/browse/HDFS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440750#comment-13440750
 ] 

Kihwal Lee commented on HDFS-3177:
----------------------------------

I think consistent checksum in concat() can be supported by inserting checks. 
It hits datanodes, but won't be too bad since it's reading only checksum file 
and sending MD5 of it.

I tested this and it passes TestHDFSConcat.

{code}
  public void concat(String trg, String [] srcs) throws IOException {
    checkOpen();
    try {
+      // check the checksum consistency
+      MD5MD5CRC32FileChecksum csum = null;
+      String src = "";
+      for (String s : srcs) {
+        MD5MD5CRC32FileChecksum csumToCompare = getFileChecksum(s);
+        if (csumToCompare.getChecksumOpt().getChecksumType() ==
+            DataChecksum.Type.MIXED) {
+          throw new IOException("Mixed checksum type detected in " +
+              s + ". This is not supported in concat()");
+        }
+        if (csum == null) {
+          csum = csumToCompare;
+          src = s;
+          continue;
+        }
+        if (csum.getChecksumOpt().getChecksumType() !=
+            csumToCompare.getChecksumOpt().getChecksumType()) {
+          throw new IOException("Checksum types are different between " + s
+              + " and  " + src);
+        }
+      }
      namenode.concat(trg, srcs);
    } catch(RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class,
                                     UnresolvedPathException.class);
    }
  }
{code}
                
> Allow DFSClient to find out and use the CRC type being used for a file.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3177
>                 URL: https://issues.apache.org/jira/browse/HDFS-3177
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 0.23.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 2.1.0-alpha, 3.0.0
>
>         Attachments: hdfs-3177-after-hadoop-8239-8240.patch.txt, 
> hdfs-3177-after-hadoop-8239.patch.txt, hdfs-3177-branch2-trunk.patch.txt, 
> hdfs-3177-branch2-trunk.patch.txt, hdfs-3177-branch2-trunk.patch.txt, 
> hdfs-3177-branch2-trunk.patch.txt, hdfs-3177-branch2-trunk.patch.txt, 
> hdfs-3177.patch, hdfs-3177-with-hadoop-8239-8240.patch.txt, 
> hdfs-3177-with-hadoop-8239-8240.patch.txt, 
> hdfs-3177-with-hadoop-8239-8240.patch.txt, 
> hdfs-3177-with-hadoop-8239.patch.txt
>
>
> To support HADOOP-8060, DFSClient should be able to find out the checksum 
> type being used for files in hdfs.
> In my prototype, DataTransferProtocol was extended to include the checksum 
> type in the blockChecksum() response. DFSClient uses it in getFileChecksum() 
> to determin the checksum type. Also append() can be configured to use the 
> existing checksum type instead of the configured one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to