[ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219196#comment-15219196
 ] 

Hadoop QA commented on HDFS-8496:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 4 new + 
140 unchanged - 3 fixed = 144 total (was 143) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 14s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 16s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 184m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestFileCreationDelete |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestDataTransferProtocol |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.server.namenode.TestINodeFile |
|   | hadoop.hdfs.TestReadWhileWriting |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode |
|   | hadoop.hdfs.TestSetTimes |
| JDK v1.8.0_77 Timed out junit tests | 
org.apache.hadoop.hdfs.TestLeaseRecovery2 |
|   | org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica |
|   | org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
|   | org.apache.hadoop.hdfs.server.namenode.TestDeleteRace |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs |
|   | hadoop.hdfs.TestFileCorruption |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestDataTransferProtocol |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.TestReadWhileWriting |
|   | 
hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness
 |
| JDK v1.7.0_95 Timed out junit tests | 
org.apache.hadoop.hdfs.TestLeaseRecovery2 |
|   | org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica |
|   | org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
|   | org.apache.hadoop.hdfs.server.namenode.TestDeleteRace |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12796182/HDFS-8496.002.patch |
| JIRA Issue | HDFS-8496 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux be5334c45513 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e4fc609 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_77 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/artifact/patchprocess/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
 
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15010/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Calling stopWriter() with FSDatasetImpl lock held may block other threads
> -------------------------------------------------------------------------
>
>                 Key: HDFS-8496
>                 URL: https://issues.apache.org/jira/browse/HDFS-8496
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: zhouyingchao
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
> heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
> looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
> lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are 
> blocked by FSDatasetImpl lock:
> {code}
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
>         - waiting to lock <0x00000007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
>         - locked <0x0000000770465dc0> (a java.lang.Object)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another 
> thread to exit in stopWriter(). The stack is:
> {code}
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Thread.join(Thread.java:1194)
>         - locked <0x00000007636953b8> (a org.apache.hadoop.util.Daemon)
>         at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
>         - locked <0x00000007701badc0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local 
> file system and disk is quite busy. We guess this is why the stopWriter took 
> quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl 
> lock held.   In HDFS-7999, the createTemporary() is changed to call 
> stopWriter without FSDatasetImpl lock. We guess we should do so in the other 
> three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to