Piotr Kołaczkowski created MAPREDUCE-4506:
---------------------------------------------
Summary: EofException / 'connection reset by peer' while copying
map output
Key: MAPREDUCE-4506
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4506
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 1.0.3
Environment: Ubuntu Linux 12.04 LTS, 64-bit, Java 6 update 33
Reporter: Piotr Kołaczkowski
Priority: Minor
When running complex mapreduce jobs with many mappers and reducers (e.g. 8
mappers, 8 reducers on a 8 core machine), sometimes the following exceptions
pop up in the logs during the shuffle phase:
{noformat}
WARN [570516323@qtp-2060060479-164] 2012-07-19 02:50:21,229 TaskTracker.java
(line 3894) getMapOutput(attempt_201207161621_0217_m_000071_0,0) failed :
org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
at
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:568)
at
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1005)
at
org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:648)
at
org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:579)
at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3872)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
at
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72)
at sun.nio.ch.IOUtil.write(IOUtil.java:43)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
at
org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
{noformat}
The problem looks like some network problems at first, however it turns out
that hadoop shuffleInMemory sometimes deliberately closes map-output-copy
connections just to reopen them a few milliseconds later, because of temporary
unavailability of free memory. Because the sending side does not expect this,
an exception is thrown. Additionally this leads to wasting resources on the
sender side, which does more work than required serving additional requests.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira