[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284104#comment-13284104
 ] 

xieguiming commented on MAPREDUCE-2386:
---------------------------------------

Hi:
On my cluster, one TT also stuck. It's not responding to any HTTP connections 

1> the thread stack info:

"1989360587@qtp-1863318328-0 - Acceptor0 [email protected]:10060" 
prio=10 tid=0x00007fb9fc2a6800 nid=0x612e runnable [0x00007fba0015b000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x00007fba14758c70> (a sun.nio.ch.Util$1)
        - locked <0x00007fba14758c58> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00007fba124d8aa8> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
        at 
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:652)
        at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
        at 
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
        at 
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
        at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2> I use netstat cmd to check the 50060 port state, and find 83 connections are 
on CLOSE_WAIT or SYN_RECV state.
tcp        0      0 172.16.4.7:50060        172.16.4.6:52526        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:41380        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.5:41908        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52495        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.8:39167        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.8:38799        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52416        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:47010        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.5:42449        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.2:50107        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52558        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52402        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52085        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.2:45092        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:41542        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:55977        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.4:43743        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.5:42118        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.2:44535        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:41890        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:56001        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.5:42057        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:56121        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.8:39173        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.8:38937        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.2:44992        SYN_RECV    
tcp      129      0 :::50060                :::*                    LISTEN      
tcp      243      0 172.16.4.7:50060        172.16.4.7:35878        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:50557        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33735        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:40670        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45702        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50653        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50538        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:48535        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52049        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45529        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:38282        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:51933        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33008        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50188        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47068        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50638        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:50629        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50676        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.4:45076        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:37301        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:35873        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33733        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45487        CLOSE_WAIT  
tcp        1      0 172.16.4.7:50060        172.16.4.8:47078        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:51939        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50578        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:50630        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.1:35526        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.1:57037        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52755        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.1:51096        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50207        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:51951        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:35876        CLOSE_WAIT  
tcp        1      0 172.16.4.7:50060        172.16.4.4:42804        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52771        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52110        CLOSE_WAIT  
tcp        1      0 172.16.4.7:50060        172.16.4.4:42686        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45688        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50590        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:48497        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:37370        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33010        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:51908        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33003        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45469        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33002        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33737        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50198        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52746        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47067        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:37300        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50705        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:38319        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:47550        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.1:56333        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52004        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47065        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52814        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33739        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33734        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47069        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47063        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:38392        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:50716        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.4:45128        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:38317        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33007        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33006        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33736        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:49722        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50185        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52820        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45273        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:49730        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:49957        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:47477        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45720        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52011        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52079        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50583        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52037        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45437        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50168        CLOSE_WAIT  

                
> TT jetty server stuck in tight loop around epoll_wait
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2386
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2386
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.23.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>
> In some load testing, I got a TaskTracker into a state where its Jetty server 
> is in a tight loop calling epoll_wait, which is returning EINVAL:
> [pid 19573] epoll_wait(157, 40829000, 8192, 0) = -1 EINVAL (Invalid argument)
> It's not responding to any HTTP connections - connections are accepted and 
> then just hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to