[ https://issues.apache.org/jira/browse/HDFS-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375334#comment-16375334 ]
genericqa commented on HDFS-13191: ---------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 38s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 32s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}207m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13191 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911860/HDFS-13191.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c40c49a3aaf7 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 329a4fd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23185/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23185/testReport/ | | Max. process+thread count | 4194 (vs. ulimit of 10000) | | modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23185/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Internal buffer-sizing details are inadvertently baked into FileChecksum and > BlockGroupChecksum > ----------------------------------------------------------------------------------------------- > > Key: HDFS-13191 > URL: https://issues.apache.org/jira/browse/HDFS-13191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client > Reporter: Dennis Huo > Priority: Minor > Attachments: HDFS-13191.001.patch > > > The org.apache.hadoop.io.DataOutputBuffer is used as an "optimization" in > many places to allow a reusable form of ByteArrayOutputStream, but requires > the caller to be careful to use getLength() instead of getData().length to > determine the number of actually valid bytes to consume. > At least three places in the path of constructing FileChecksums have > incorrect usage of DataOutputBuffer: > [FileChecksumHelper digesting block > MD5s|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java#L239] > [BlockChecksumHelper digesting striped block MD5s to construct block-group > checksum|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockChecksumHelper.java#L412] > [MD5MD5CRC32FileChecksum.getBytes()|https://github.com/apache/hadoop/blob/329a4fdd07ab007615f34c8e0e651360f988064d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java#L76] > The net effect is that FileChecksum consumes exact BlockChecksums if there > are 1 or 2 blocks (at 16 and 32 bytes respectively), but at 3 blocks will > round up to 64 bytes, effectively returning the same FileChecksum as if there > were 4 blocks and the 4th block happened to have an MD5 exactly equal to > 0x00...00. Similarly, BlockGroupChecksum will behave as if there is a > power-of-2 number of bytes from BlockChecksums in the BlockGroup. > This appears to have been a latent bug for at least 9 years for FileChecksum > (and since inception for the implementation of striped files), and works fine > as long as HDFS implementations strictly stick to the same internal buffering > semantics. > However, this also makes the implementation extremely brittle unless > carefully documented. For example, if code is ever refactored to pass around > a MessageDigest that consumes block MD5s as they come rather than writing > into a DataOutputBuffer before digesting the entire buffer, then the > resulting checksum calculations will change unexpectedly. > At the same time, "fixing" the bug would also be backwards-incompatible, so > the bug might need to stick around. At least for the FileChecksum-level > calculation, it seems the bug has been latent for a very long time. Since > striped files are fairly new, the BlockChecksumHelper could probably be fixed > sooner rather than later to avoid perpetuating a bug. The getBytes() method > for FileChecksum is more innocuous, so could likely be fixed or left as-is > without too much impact either way. > The bug can be highlighted by changing the internal buffer-growing semantics > of the DataOutputBuffer, or simply returning a randomly-sized byte buffer in > getData() while only ensuring the first getLength() bytes are actually > present, for example: > > {code:java} > diff --git > a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java > > b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java > index 4c2fa67f8f2..f2df94e898f 100644 > --- > a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java > +++ > b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java > @@ -103,7 +103,17 @@ private DataOutputBuffer(Buffer buffer) { > /** Returns the current contents of the buffer. > * Data is only valid to {@link #getLength()}. > */ > - public byte[] getData() { return buffer.getData(); } > + public byte[] getData() { > + java.util.Random rand = new java.util.Random(); > + byte[] bufferData = buffer.getData(); > + byte[] ret = new byte[rand.nextInt(bufferData.length) + bufferData.length]; > + System.arraycopy(bufferData, 0, ret, 0, getLength()); > + return ret; > + } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org