[
https://issues.apache.org/jira/browse/MAPREDUCE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284104#comment-13284104
]
xieguiming commented on MAPREDUCE-2386:
---------------------------------------
Hi:
On my cluster, one TT also stuck. It's not responding to any HTTP connections
1> the thread stack info:
"1989360587@qtp-1863318328-0 - Acceptor0 [email protected]:10060"
prio=10 tid=0x00007fb9fc2a6800 nid=0x612e runnable [0x00007fba0015b000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x00007fba14758c70> (a sun.nio.ch.Util$1)
- locked <0x00007fba14758c58> (a java.util.Collections$UnmodifiableSet)
- locked <0x00007fba124d8aa8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
at
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:652)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
at
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2> I use netstat cmd to check the 50060 port state, and find 83 connections are
on CLOSE_WAIT or SYN_RECV state.
tcp 0 0 172.16.4.7:50060 172.16.4.6:52526 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.3:41380 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.5:41908 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.6:52495 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.8:39167 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.8:38799 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.6:52416 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.6:47010 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.5:42449 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.2:50107 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.6:52558 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.6:52402 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.6:52085 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.2:45092 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.3:41542 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.3:55977 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.4:43743 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.5:42118 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.2:44535 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.3:41890 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.3:56001 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.5:42057 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.3:56121 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.8:39173 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.8:38937 SYN_RECV
tcp 0 0 172.16.4.7:50060 172.16.4.2:44992 SYN_RECV
tcp 129 0 :::50060 :::* LISTEN
tcp 243 0 172.16.4.7:50060 172.16.4.7:35878 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:50557 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33735 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:40670 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.5:45702 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:50653 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:50538 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:48535 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:52049 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.5:45529 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:38282 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:51933 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33008 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.2:50188 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:47068 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:50638 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:50629 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:50676 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.4:45076 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:37301 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:35873 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33733 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.5:45487 CLOSE_WAIT
tcp 1 0 172.16.4.7:50060 172.16.4.8:47078 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:51939 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:50578 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:50630 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.1:35526 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.1:57037 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:52755 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.1:51096 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.2:50207 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:51951 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:35876 CLOSE_WAIT
tcp 1 0 172.16.4.7:50060 172.16.4.4:42804 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:52771 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:52110 CLOSE_WAIT
tcp 1 0 172.16.4.7:50060 172.16.4.4:42686 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.5:45688 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:50590 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:48497 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:37370 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33010 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:51908 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33003 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.5:45469 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33002 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33737 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.2:50198 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:52746 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:47067 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:37300 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:50705 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:38319 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:47550 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.1:56333 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:52004 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:47065 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:52814 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33739 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33734 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:47069 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:47063 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:38392 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:50716 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.4:45128 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:38317 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33007 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33006 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.8:33736 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.2:49722 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.2:50185 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:52820 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.5:45273 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.2:49730 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:49957 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.6:47477 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.5:45720 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:52011 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:52079 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.3:50583 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.7:52037 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.5:45437 CLOSE_WAIT
tcp 243 0 172.16.4.7:50060 172.16.4.2:50168 CLOSE_WAIT
> TT jetty server stuck in tight loop around epoll_wait
> -----------------------------------------------------
>
> Key: MAPREDUCE-2386
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2386
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Affects Versions: 0.23.0
> Environment: RHEL 6.0 "Santiago"
> Reporter: Todd Lipcon
>
> In some load testing, I got a TaskTracker into a state where its Jetty server
> is in a tight loop calling epoll_wait, which is returning EINVAL:
> [pid 19573] epoll_wait(157, 40829000, 8192, 0) = -1 EINVAL (Invalid argument)
> It's not responding to any HTTP connections - connections are accepted and
> then just hang.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira