Kai Zheng created HDFS-9705: ------------------------------- Summary: Refine the behaviour of getFileChecksum when length = 0 Key: HDFS-9705 URL: https://issues.apache.org/jira/browse/HDFS-9705 Project: Hadoop HDFS Issue Type: Improvement Reporter: Kai Zheng Assignee: Kai Zheng Priority: Minor
{{FileSystem#getFileChecksum}} may accept {{length}} parameter and 0 is a valid value. Currently it will return {{null}} when length is 0, in the following code block: {code} //compute file MD5 final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData()); switch (crcType) { case CRC32: return new MD5MD5CRC32GzipFileChecksum(bytesPerCRC, crcPerBlock, fileMD5); case CRC32C: return new MD5MD5CRC32CastagnoliFileChecksum(bytesPerCRC, crcPerBlock, fileMD5); default: // If there is no block allocated for the file, // return one with the magic entry that matches what previous // hdfs versions return. if (locatedblocks.size() == 0) { return new MD5MD5CRC32GzipFileChecksum(0, 0, fileMD5); } // we should never get here since the validity was checked // when getCrcType() was called above. return null; } {code} The comment says "we should never get here since the validity was checked" but it does. As we're using the MD5-MD5-X approach, and {{EMPTY--CONTENT}} actually is a valid case in which the MD5 value is {{d41d8cd98f00b204e9800998ecf8427e}}, so suggest we return a reasonable value other than null. At least some useful information in the returned value can be seen, like values from block checksum header. -- This message was sent by Atlassian JIRA (v6.3.4#6332)