Laurent Goujon created HDFS-5798:
------------------------------------

             Summary: DFSClient uses non-valid data when computing file checksum
                 Key: HDFS-5798
                 URL: https://issues.apache.org/jira/browse/HDFS-5798
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.0.5-alpha, 1.1.2
            Reporter: Laurent Goujon


In DFSClient.java, when computing the checksum, all md5 checksums are fetched 
for each block and added to a DataOutputStream instance (md5out), and later 
final checksum is computed this way:

{code:title=DFSClient.java}
final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());
{code}

The problem is that getData() return you a buffer valid until 
md5out.getLength(), and fileMD5 is the MD5 of the MD5 of each block PLUS a 
bunch of random values (here, buffer is not reused so it should be 0) which 
depends on the Java implementation of the ByteArrayOutputStream.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to