[ https://issues.apache.org/jira/browse/HDFS-15644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220819#comment-17220819 ]
Hadoop QA commented on HDFS-15644: ---------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 35s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 35s{color} | | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 34s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s{color} | | {color:green} branch-2.10 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt] | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 105 unchanged - 5 fixed = 110 total (was 110) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 33s{color} | | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 39s{color} | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}121m 12s{color} | | {color:black}{color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.hdfs.TestRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/artifact/out/Dockerfile | | JIRA Issue | HDFS-15644 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13014133/HDFS-15644-branch-2.10.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e785e019bd32 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | branch-2.10 / 90ebbaa3938a5c2908cb53fb9cd2d6630a949dc3 | | Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 | | Multi-JDK versions | /usr/lib/jvm/java-7-openjdk-amd64:Oracle Corporation-1.7.0_95-b00 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 | | Test Results | https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/testReport/ | | Max. process+thread count | 2174 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.0.1 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Failed volumes can cause DNs to stop block reporting > ---------------------------------------------------- > > Key: HDFS-15644 > URL: https://issues.apache.org/jira/browse/HDFS-15644 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement, datanode > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Major > Labels: refactor > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5 > > Attachments: HDFS-15644-branch-2.10.002.patch, HDFS-15644.001.patch, > HDFS-15644.002.patch > > > [~daryn] found a corner case where remove failed volumes can cause a NPE in > [FsDataSetImpl.getBlockReports()|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1939]. > +Scenario:+ > * Inside {{Datanode#HandleVolumeFailures()}}, removing a failed volume is a > 2-step process. > ** First it's removed from from the volumes list > ** Later in time are the replicas scrubbed from the volume map > * A concurrent thread generating blockReports may access the replicaMap > accessing a non existing VolumeID. > He made a fix for that and we have been using it on our clusters since > Hadoop-2.7. > By analyzing the code, the bug is still applicable to Trunk. > * The path Datanode#removeVolumes() is safe because the two step process in > {{FsDataImpl.removeVolumes()}} > [FsDatasetImpl.java#L577|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L577] > is protected by {{datasetWriteLock}} . > * The path Datanode#handleVolumeFailures() is not safe because the failed > volume is removed from the list without acquiring > {{datasetWriteLock}}.[FsVolumList#239|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java#L239] > The race condition can cause the caller of getBlockReports() to throw NPE if > the RUR is referring to a volume that has already been removed > [FsDatasetImpl.java#L1976|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1976]. > {code:java} > case RUR: > ReplicaInfo orig = b.getOriginalReplica(); > builders.get(volStorageID).add(orig); > break; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org