[ 
https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552147#comment-14552147
 ] 

Hadoop QA commented on HDFS-8429:
---------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m  5s | The applied patch generated  2 
new checkstyle issues (total was 19, now 21). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 53s | Tests passed in 
hadoop-common. |
| | |  60m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734105/HDFS-8429-001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ce53c8e |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11061/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11061/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11061/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11061/console |


This message was automatically generated.

> Death of watcherThread making other local read blocked
> ------------------------------------------------------
>
>                 Key: HDFS-8429
>                 URL: https://issues.apache.org/jira/browse/HDFS-8429
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: zhouyingchao
>            Assignee: zhouyingchao
>         Attachments: HDFS-8429-001.patch
>
>
> In our cluster, an application is hung when doing a short circuit read of 
> local hdfs block. By looking into the log, we found the DataNode's 
> DomainSocketWatcher.watcherThread has exited with following log:
> {code}
> ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: 
> Thread[Thread-25,5,main] terminating on unexpected exception
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> The line 463 is following code snippet:
> {code}
>          try {
>             for (int fd : fdSet.getAndClearReadableFds()) {
>               sendCallbackAndRemove("getAndClearReadableFds", entries, fdSet,
>                 fd);
>             }
> {code}
> getAndClearReadableFds is a native method which will malloc an int array. 
> Since our memory is very tight, it looks like the malloc failed and a NULL 
> pointer is returned.
> The bad thing is that other threads then blocked in stack like this:
> {code}
> "DataXceiver for client 
> unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for 
> operation #1]" daemon prio=10 tid=0x00007f0c9c086d90 nid=0x8fc3 waiting on 
> condition [0x00007f09b9856000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000007b0174808> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>         at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323)
>         at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> IMO, we should exit the DN so that the users can know that something go  
> wrong  and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to