[
https://issues.apache.org/jira/browse/MAPREDUCE-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284130#comment-13284130
]
xieguiming commented on MAPREDUCE-5:
------------------------------------
I have analyzed this problem for one whole day, and I will show some details
more.
1>The TT throw the EofException and the IllegalStateExcetion for the
getMapOutput.
2>and then,I use the netstat command to check the http port (50060), and find
83 connections are on CLOSE_WAIT state.and the CLOSE_WAIT state do not disapper
always. At least, for 24 hours.
3>form the TT log, after print the exception, the TT http server do not work
well. can not accept any http request(no "sent out" log found later). and JT
add it to the blacklist. I use the curl shell command to access the http
service, and client throw timeout. and the Datanode http service on the same
node is ok.
4>and I also find the TT CPU is 100% even when there is no any childjvm.
5>and I also find the reduce task on the same node copy slower from other node .
6>I restart the TT. and the TT works well.
I attach the TT logs. if need other logs, tell me. but I am sorry that we have
not the matched userlog, because the userlog will be delete after only 3 hours.
and when we find the problem, and many hours pass.
> Shuffle's getMapOutput() fails with EofException, followed by
> IllegalStateException
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.20.2
> Environment: Sun Java 1.6.0_13, OpenSolaris, running on a SunFire
> 4150 (x64) 10 node cluster
> Reporter: George Porter
> Attachments: temp.rar
>
>
> During the shuffle phase, I'm seeing a large sequence of the following
> actions:
> 1) WARN org.apache.hadoop.mapred.TaskTracker:
> getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed :
> org.mortbay.jetty.EofException
> 2) WARN org.mortbay.log: Committed before 410
> getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed :
> org.mortbay.jetty.EofException
> 3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException:
> Committed
> The map phase completes with 100%, and then the reduce phase crawls along
> with the above errors in each of the TaskTracker logs. None of the
> tasktrackers get lost. When I run non-data jobs like the 'pi' test from the
> example jar, everything works fine.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira