[ 
https://issues.apache.org/jira/browse/HADOOP-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-11604:
-----------------------------------
    Attachment: HADOOP-11604.004.patch

[~xieliang007], Happy Spring Festival to you too!  :-)

I'm attaching patch v004.  This adds a test, which was a bit tricky to write.  
It was pretty easy to force the {{ConcurrentModificationException}}.  I've 
added {{testCloseSocketOnWatcherClose}} to do this.  However, the exception 
gets thrown on a separate thread from JUnit's execution, so it wouldn't 
actually cause the test to fail.  I've modified the test suite so that for 
every test, we track whether or not the thread terminated with an unexpected 
exception.  The new test fails without my changes in {{DomainSocketWatcher}}, 
and then it passes after I apply the fix.

It occurs to me that we still probably don't know the real root cause of what 
happened in Liang's cluster.  Why did the thread exit prematurely?  This 
{{ConcurrentModificationException}} thrown from the {{finally}} block would 
have masked any exception thrown from the {{try}} body.  Probably the best we 
can do at this point is to get in this fix, and then watch for additional 
reports of the problem afterwards.

> Reach xceiver limit once the watcherThread die
> ----------------------------------------------
>
>                 Key: HADOOP-11604
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11604
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: net
>    Affects Versions: 2.6.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>            Priority: Critical
>         Attachments: HADOOP-11604-001.txt, HADOOP-11604-002.txt, 
> HADOOP-11604.003.patch, HADOOP-11604.004.patch
>
>
> Our product cluster hit the Xceiver limit even w/ HADOOP-10404 & 
> HADOOP-11333, i found it was caused by DomainSocketWatcher.watcherThread 
> gone. Attached is a possible fix, please review, thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to