[ 
https://issues.apache.org/jira/browse/HDFS-15644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218581#comment-17218581
 ] 

Hadoop QA commented on HDFS-15644:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} |  | {color:red} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
57s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 37s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
16s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
13s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
11s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  5s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}145m  6s{color} 
| 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/253/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
 | {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} |  | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}220m 46s{color} | 
 | {color:black}{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestMultiThreadedHflush |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerWithStripedBlocks |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.TestDFSStorageStateRecovery |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.server.namenode.TestAddStripedBlockInFBR |
|   | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | hadoop.hdfs.TestParallelUnixDomainRead |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestErasureCodingPolicies |
|   | hadoop.hdfs.TestDecommissionWithStriped |
|   | hadoop.hdfs.TestDistributedFileSystemWithECFileWithRandomECPolicy |
|   | hadoop.hdfs.TestDFSStripedInputStream |
|   | hadoop.hdfs.TestErasureCodeBenchmarkThroughput |
|   | hadoop.hdfs.tools.TestDFSAdminWithHA |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestDFSStripedOutputStream |
|   | hadoop.hdfs.TestReadStripedFileWithDecodingDeletedData |
|   | hadoop.hdfs.TestDistributedFileSystem |
|   | hadoop.hdfs.TestReadStripedFileWithDNFailure |
|   | hadoop.hdfs.tools.TestECAdmin |
|   | hadoop.hdfs.TestDatanodeDeath |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData |
|   | hadoop.hdfs.TestErasureCodingMultipleRacks |
|   | hadoop.hdfs.TestSnapshotCommands |
|   | hadoop.hdfs.TestFileAppend4 |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy |
|   | hadoop.hdfs.TestAclsEndToEnd |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.qjournal.client.TestQJMWithFaults |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/253/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15644 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13013942/HDFS-15644.002.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux a960070b0bc5 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 7f8ef76c4833262f60cac2956aaa7fb75c0a77bc |
| Default Java | Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 |
| Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/253/testReport/ |
| Max. process+thread count | 3666 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/253/console |
| versions | git=2.17.1 maven=3.6.0 findbugs=4.1.3 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Failed volumes can cause DNs to stop block reporting
> ----------------------------------------------------
>
>                 Key: HDFS-15644
>                 URL: https://issues.apache.org/jira/browse/HDFS-15644
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: block placement, datanode
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>              Labels: refactor
>         Attachments: HDFS-15644.001.patch, HDFS-15644.002.patch
>
>
> [~daryn] found a corner case where remove failed volumes can cause a NPE in 
> [FsDataSetImpl.getBlockReports()|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1939].
> +Scenario:+
>  * Inside {{Datanode#HandleVolumeFailures()}}, removing a failed volume is a 
> 2-step process.
>  ** First it's removed from from the volumes list
>  ** Later in time are the replicas scrubbed from the volume map
>  * A concurrent thread generating blockReports may access the replicaMap 
> accessing a non existing VolumeID.
> He made a fix for that and we have been using it on our clusters since 
> Hadoop-2.7.
> By analyzing the code, the bug is still applicable to Trunk.
>  * The path Datanode#removeVolumes() is safe because the two step process in 
> {{FsDataImpl.removeVolumes()}} 
> [FsDatasetImpl.java#L577|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L577]
>  is protected by {{datasetWriteLock}} .
>  * The path Datanode#handleVolumeFailures() is not safe because the failed 
> volume is removed from the list without acquiring 
> {{datasetWriteLock}}.[FsVolumList#239|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java#L239]
> The race condition can cause the caller of getBlockReports() to throw NPE if 
> the RUR is referring to a volume that has already been removed 
> [FsDatasetImpl.java#L1976|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1976].
> {code:java}
>         case RUR:
>           ReplicaInfo orig = b.getOriginalReplica();
>           builders.get(volStorageID).add(orig);
>           break;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to