[
https://issues.apache.org/jira/browse/HDFS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440750#comment-13440750
]
Kihwal Lee commented on HDFS-3177:
----------------------------------
I think consistent checksum in concat() can be supported by inserting checks.
It hits datanodes, but won't be too bad since it's reading only checksum file
and sending MD5 of it.
I tested this and it passes TestHDFSConcat.
{code}
public void concat(String trg, String [] srcs) throws IOException {
checkOpen();
try {
+ // check the checksum consistency
+ MD5MD5CRC32FileChecksum csum = null;
+ String src = "";
+ for (String s : srcs) {
+ MD5MD5CRC32FileChecksum csumToCompare = getFileChecksum(s);
+ if (csumToCompare.getChecksumOpt().getChecksumType() ==
+ DataChecksum.Type.MIXED) {
+ throw new IOException("Mixed checksum type detected in " +
+ s + ". This is not supported in concat()");
+ }
+ if (csum == null) {
+ csum = csumToCompare;
+ src = s;
+ continue;
+ }
+ if (csum.getChecksumOpt().getChecksumType() !=
+ csumToCompare.getChecksumOpt().getChecksumType()) {
+ throw new IOException("Checksum types are different between " + s
+ + " and " + src);
+ }
+ }
namenode.concat(trg, srcs);
} catch(RemoteException re) {
throw re.unwrapRemoteException(AccessControlException.class,
UnresolvedPathException.class);
}
}
{code}
> Allow DFSClient to find out and use the CRC type being used for a file.
> -----------------------------------------------------------------------
>
> Key: HDFS-3177
> URL: https://issues.apache.org/jira/browse/HDFS-3177
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node, hdfs client
> Affects Versions: 0.23.0
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Fix For: 2.1.0-alpha, 3.0.0
>
> Attachments: hdfs-3177-after-hadoop-8239-8240.patch.txt,
> hdfs-3177-after-hadoop-8239.patch.txt, hdfs-3177-branch2-trunk.patch.txt,
> hdfs-3177-branch2-trunk.patch.txt, hdfs-3177-branch2-trunk.patch.txt,
> hdfs-3177-branch2-trunk.patch.txt, hdfs-3177-branch2-trunk.patch.txt,
> hdfs-3177.patch, hdfs-3177-with-hadoop-8239-8240.patch.txt,
> hdfs-3177-with-hadoop-8239-8240.patch.txt,
> hdfs-3177-with-hadoop-8239-8240.patch.txt,
> hdfs-3177-with-hadoop-8239.patch.txt
>
>
> To support HADOOP-8060, DFSClient should be able to find out the checksum
> type being used for files in hdfs.
> In my prototype, DataTransferProtocol was extended to include the checksum
> type in the blockChecksum() response. DFSClient uses it in getFileChecksum()
> to determin the checksum type. Also append() can be configured to use the
> existing checksum type instead of the configured one.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira