Laurent Goujon created HDFS-5798: ------------------------------------ Summary: DFSClient uses non-valid data when computing file checksum Key: HDFS-5798 URL: https://issues.apache.org/jira/browse/HDFS-5798 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.5-alpha, 1.1.2 Reporter: Laurent Goujon
In DFSClient.java, when computing the checksum, all md5 checksums are fetched for each block and added to a DataOutputStream instance (md5out), and later final checksum is computed this way: {code:title=DFSClient.java} final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData()); {code} The problem is that getData() return you a buffer valid until md5out.getLength(), and fileMD5 is the MD5 of the MD5 of each block PLUS a bunch of random values (here, buffer is not reused so it should be 0) which depends on the Java implementation of the ByteArrayOutputStream. -- This message was sent by Atlassian JIRA (v6.1.5#6160)