[ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484509#comment-14484509
 ] 

Colin Patrick McCabe commented on HADOOP-11802:
-----------------------------------------------

Hi Eric,

You should never be in "the main finally block" of DomainSocketWatcher unless 
you are in a unit test.  If you are in this finally block in the actual 
DataNode, something is wrong.  You should see a string like "terminating on 
InterruptedException" or "terminating on IOException" explaining why you ended 
up in this finally block in the first place.  This should be the root cause.  
Do you have a log line like that?

> DomainSocketWatcher#watcherThread can encounter IllegalStateException in 
> finally block when calling sendCallback
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11802
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11802
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
> call to {{sendCallback}} can encounter an {{IllegalStateException}}, and 
> leave some cleanup tasks undone.
> {code}
>       } finally {
>         lock.lock();
>         try {
>           kick(); // allow the handler for notificationSockets[0] to read a 
> byte
>           for (Entry entry : entries.values()) {
>             // We do not remove from entries as we iterate, because that can
>             // cause a ConcurrentModificationException.
>             sendCallback("close", entries, fdSet, entry.getDomainSocket().fd);
>           }
>           entries.clear();
>           fdSet.close();
>         } finally {
>           lock.unlock();
>         }
>       }
> {code}
> The exception causes {{watcherThread}} to skip the calls to 
> {{entries.clear()}} and {{fdSet.close()}}.
> {code}
> 2015-04-02 11:48:09,941 [DataXceiver for client 
> unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
> DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
> 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
> e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
> 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
> Thread[Thread-14,5,main] terminating on unexpected exception
> java.lang.IllegalStateException: failed to remove 
> b845649551b6b1eab5c17f630e42489d
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
>         at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
>         at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
>         at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
>         at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
>         at java.lang.Thread.run(Thread.java:722)
> {code}
> Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or 
> HADOOP-10404. The cluster installation is running code with all of these 
> fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to