[
https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552147#comment-14552147
]
Hadoop QA commented on HDFS-8429:
---------------------------------
\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is
healthy. |
| {color:green}+1{color} | @author | 0m 0s | The patch does not contain any
@author tags. |
| {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear
to include any new or modified tests. Please justify why no new tests are
needed for this patch. Also please list what manual steps were performed to
verify this patch. |
| {color:green}+1{color} | javac | 7m 31s | There were no new javac warning
messages. |
| {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc
warning messages. |
| {color:green}+1{color} | release audit | 0m 23s | The applied patch does
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle | 1m 5s | The applied patch generated 2
new checkstyle issues (total was 19, now 21). |
| {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that
end in whitespace. |
| {color:green}+1{color} | install | 1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with
eclipse:eclipse. |
| {color:green}+1{color} | findbugs | 1m 40s | The patch does not introduce
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests | 22m 53s | Tests passed in
hadoop-common. |
| | | 60m 2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL |
http://issues.apache.org/jira/secure/attachment/12734105/HDFS-8429-001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ce53c8e |
| checkstyle |
https://builds.apache.org/job/PreCommit-HDFS-Build/11061/artifact/patchprocess/diffcheckstylehadoop-common.txt
|
| hadoop-common test log |
https://builds.apache.org/job/PreCommit-HDFS-Build/11061/artifact/patchprocess/testrun_hadoop-common.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/11061/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/11061/console |
This message was automatically generated.
> Death of watcherThread making other local read blocked
> ------------------------------------------------------
>
> Key: HDFS-8429
> URL: https://issues.apache.org/jira/browse/HDFS-8429
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: zhouyingchao
> Assignee: zhouyingchao
> Attachments: HDFS-8429-001.patch
>
>
> In our cluster, an application is hung when doing a short circuit read of
> local hdfs block. By looking into the log, we found the DataNode's
> DomainSocketWatcher.watcherThread has exited with following log:
> {code}
> ERROR org.apache.hadoop.net.unix.DomainSocketWatcher:
> Thread[Thread-25,5,main] terminating on unexpected exception
> java.lang.NullPointerException
> at
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The line 463 is following code snippet:
> {code}
> try {
> for (int fd : fdSet.getAndClearReadableFds()) {
> sendCallbackAndRemove("getAndClearReadableFds", entries, fdSet,
> fd);
> }
> {code}
> getAndClearReadableFds is a native method which will malloc an int array.
> Since our memory is very tight, it looks like the malloc failed and a NULL
> pointer is returned.
> The bad thing is that other threads then blocked in stack like this:
> {code}
> "DataXceiver for client
> unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for
> operation #1]" daemon prio=10 tid=0x00007f0c9c086d90 nid=0x8fc3 waiting on
> condition [0x00007f09b9856000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007b0174808> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323)
> at
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> IMO, we should exit the DN so that the users can know that something go
> wrong and fix it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)