[
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564543#comment-14564543
]
Hadoop QA commented on HDFS-8496:
---------------------------------
\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch | 17m 34s | Pre-patch trunk compilation is
healthy. |
| {color:green}+1{color} | @author | 0m 0s | The patch does not contain any
@author tags. |
| {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear
to include any new or modified tests. Please justify why no new tests are
needed for this patch. Also please list what manual steps were performed to
verify this patch. |
| {color:green}+1{color} | javac | 7m 26s | There were no new javac warning
messages. |
| {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc
warning messages. |
| {color:green}+1{color} | release audit | 0m 23s | The applied patch does
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle | 2m 14s | The applied patch generated 1
new checkstyle issues (total was 124, now 120). |
| {color:red}-1{color} | whitespace | 0m 0s | The patch has 3 line(s) that
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install | 1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with
eclipse:eclipse. |
| {color:green}+1{color} | findbugs | 3m 13s | The patch does not introduce
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native | 3m 12s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 162m 57s | Tests passed in hadoop-hdfs.
|
| | | 208m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL |
http://issues.apache.org/jira/secure/attachment/12736065/HDFS-8496-001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d725dd8 |
| checkstyle |
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
|
| whitespace |
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/whitespace.txt
|
| hadoop-hdfs test log |
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/testrun_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/console |
This message was automatically generated.
> Calling stopWriter() with FSDatasetImpl lock held may block other threads
> --------------------------------------------------------------------------
>
> Key: HDFS-8496
> URL: https://issues.apache.org/jira/browse/HDFS-8496
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: zhouyingchao
> Assignee: zhouyingchao
> Attachments: HDFS-8496-001.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and
> heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By
> looking at the stack, we found the calling of stopWriter() with FSDatasetImpl
> lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are
> blocked by FSDatasetImpl lock:
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
> - waiting to lock <0x00000007701badc0> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
> - locked <0x0000000770465dc0> (a java.lang.Object)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another
> thread to exit in stopWriter(). The stack is:
> {code}
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1194)
> - locked <0x00000007636953b8> (a org.apache.hadoop.util.Daemon)
> at
> org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
> - locked <0x00000007701badc0> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local
> file system and disk is quite busy. We guess this is why the stopWriter took
> quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl
> lock held. In HDFS-7999, the createTemporary() is changed to call
> stopWriter without FSDatasetImpl lock. We guess we should do so in the other
> three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)