[
https://issues.apache.org/jira/browse/HADOOP-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327005#comment-14327005
]
Liang Xie commented on HADOOP-11604:
------------------------------------
Thanks for all the valuable comments. After checking the out file, i saw the
ConcurrentModificationException be thrown at inside the finally block:
{code}
for (Entry entry : entries.values()) { <<<< HERE
sendCallback("close", entries, fdSet, entry.getDomainSocket().fd);
}
entries.clear();
{code}
the log is sth like:
{code}
Exception in thread "Thread-25" java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
at
org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:484)
at java.lang.Thread.run(Thread.java:662)
{code}
so the root cause in our case should be the non thread-safe pattern: foreach
{treemap.remove}.
> Reach xceiver limit once the watcherThread die
> ----------------------------------------------
>
> Key: HADOOP-11604
> URL: https://issues.apache.org/jira/browse/HADOOP-11604
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Liang Xie
> Assignee: Liang Xie
> Priority: Critical
> Attachments: HADOOP-11604-001.txt, HADOOP-11604-002.txt
>
>
> Our product cluster hit the Xceiver limit even w/ HADOOP-10404 &
> HADOOP-11333, i found it was caused by DomainSocketWatcher.watcherThread
> gone. Attached is a possible fix, please review, thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)