[ 
https://issues.apache.org/jira/browse/HDFS-15644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220819#comment-17220819
 ] 

Hadoop QA commented on HDFS-15644:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
35s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} |  | {color:red} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
35s{color} |  | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} |  | {color:green} branch-2.10 passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} |  | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} |  | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} |  | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} |  | {color:green} branch-2.10 passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} |  | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
34s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
32s{color} |  | {color:green} branch-2.10 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} |  | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt]
 | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 
105 unchanged - 5 fixed = 110 total (was 110) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} |  | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
33s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 39s{color} 
| 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
 | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} |  | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m 12s{color} | 
 | {color:black}{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15644 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13014133/HDFS-15644-branch-2.10.002.patch
 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux e785e019bd32 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | branch-2.10 / 90ebbaa3938a5c2908cb53fb9cd2d6630a949dc3 |
| Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 |
| Multi-JDK versions | /usr/lib/jvm/java-7-openjdk-amd64:Oracle 
Corporation-1.7.0_95-b00 /usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~16.04-b01 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/testReport/ |
| Max. process+thread count | 2174 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/265/console |
| versions | git=2.7.4 maven=3.3.9 findbugs=3.0.1 |
| Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Failed volumes can cause DNs to stop block reporting
> ----------------------------------------------------
>
>                 Key: HDFS-15644
>                 URL: https://issues.apache.org/jira/browse/HDFS-15644
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: block placement, datanode
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>              Labels: refactor
>             Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
>         Attachments: HDFS-15644-branch-2.10.002.patch, HDFS-15644.001.patch, 
> HDFS-15644.002.patch
>
>
> [~daryn] found a corner case where remove failed volumes can cause a NPE in 
> [FsDataSetImpl.getBlockReports()|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1939].
> +Scenario:+
>  * Inside {{Datanode#HandleVolumeFailures()}}, removing a failed volume is a 
> 2-step process.
>  ** First it's removed from from the volumes list
>  ** Later in time are the replicas scrubbed from the volume map
>  * A concurrent thread generating blockReports may access the replicaMap 
> accessing a non existing VolumeID.
> He made a fix for that and we have been using it on our clusters since 
> Hadoop-2.7.
> By analyzing the code, the bug is still applicable to Trunk.
>  * The path Datanode#removeVolumes() is safe because the two step process in 
> {{FsDataImpl.removeVolumes()}} 
> [FsDatasetImpl.java#L577|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L577]
>  is protected by {{datasetWriteLock}} .
>  * The path Datanode#handleVolumeFailures() is not safe because the failed 
> volume is removed from the list without acquiring 
> {{datasetWriteLock}}.[FsVolumList#239|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java#L239]
> The race condition can cause the caller of getBlockReports() to throw NPE if 
> the RUR is referring to a volume that has already been removed 
> [FsDatasetImpl.java#L1976|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1976].
> {code:java}
>         case RUR:
>           ReplicaInfo orig = b.getOriginalReplica();
>           builders.get(volStorageID).add(orig);
>           break;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to