[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284130#comment-13284130
 ] 

xieguiming commented on MAPREDUCE-5:
------------------------------------

I have analyzed this problem for one whole day, and I will show some details 
more.
1>The TT throw the EofException and the IllegalStateExcetion for the 
getMapOutput.

2>and then,I use the netstat command to check the http port (50060), and find 
83 connections are on CLOSE_WAIT state.and the CLOSE_WAIT state do not disapper 
always. At least, for 24 hours.

3>form the TT log, after print the exception, the TT http server do not work 
well. can not accept any http request(no "sent out" log found later). and JT 
add it to the blacklist. I use the curl shell command to access the http 
service, and client throw timeout. and the Datanode http service on the same 
node is ok.

4>and I also find the TT CPU is 100% even when there is no any childjvm.

5>and I also find the reduce task on the same node copy slower from other node .

6>I restart the TT. and the TT works well.

I attach the TT logs. if need other logs, tell me. but I am sorry that  we have 
not the matched userlog, because the userlog will be delete after only 3 hours. 
and when we find the problem, and many hours pass.


                
> Shuffle's getMapOutput() fails with EofException, followed by 
> IllegalStateException
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>         Environment: Sun Java 1.6.0_13, OpenSolaris, running on a SunFire 
> 4150 (x64) 10 node cluster
>            Reporter: George Porter
>         Attachments: temp.rar
>
>
> During the shuffle phase, I'm seeing a large sequence of the following 
> actions:
> 1) WARN org.apache.hadoop.mapred.TaskTracker: 
> getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : 
> org.mortbay.jetty.EofException
> 2) WARN org.mortbay.log: Committed before 410 
> getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : 
> org.mortbay.jetty.EofException
> 3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: 
> Committed
> The map phase completes with 100%, and then the reduce phase crawls along 
> with the above errors in each of the TaskTracker logs.  None of the 
> tasktrackers get lost.  When I run non-data jobs like the 'pi' test from the 
> example jar, everything works fine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to