Re: Never ending reduce jobs, error Error reading task outputConnection refused

Robert Evans Fri, 04 Nov 2011 08:40:40 -0700

I am not sure what is causing this, but yes they are related.  In hadoop the 
map output is served to the reducers through jetty, which is an imbedded web 
server.  If the reducers are not able to fetch the map outputs, then they 
assume that the mapper is bad and a new mapper is relaunched to compute the map 
output.  From the errors it looks like the map output is being deleted/not 
showing up for some of the mappers.  I am not really sure why that would be 
happening.  What version of hadoop are you using.


--Bobby Evans

On 11/4/11 10:28 AM, "Russell Brown" <misterr...@gmail.com> wrote:

Hi,
I have a cluster of 4 tasktracker/datanodes and 1 JobTracker/Namenode. I can 
run small jobs on this cluster fine (like up to a few thousand keys) but more 
than that and I start seeing errors like this:


11/11/04 08:16:08 INFO mapred.JobClient: Task Id : 
attempt_201111040342_0006_m_000005_0, Status : FAILED
Too many fetch-failures
11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection 
refused
11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection 
refused
11/11/04 08:16:13 INFO mapred.JobClient:  map 97% reduce 1%
11/11/04 08:16:25 INFO mapred.JobClient:  map 100% reduce 1%
11/11/04 08:17:20 INFO mapred.JobClient: Task Id : 
attempt_201111040342_0006_m_000010_0, Status : FAILED
Too many fetch-failures
11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection 
refused
11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection 
refused
11/11/04 08:17:24 INFO mapred.JobClient:  map 97% reduce 1%
11/11/04 08:17:36 INFO mapred.JobClient:  map 100% reduce 1%
11/11/04 08:19:20 INFO mapred.JobClient: Task Id : 
attempt_201111040342_0006_m_000011_0, Status : FAILED
Too many fetch-failures



I have no IDEA what this means. All my nodes can ssh to each other, pass 
wordlessly, all the time.

On the individual data/task nodes the logs have errors like this:

2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: 
getMapOutput(attempt_201111040342_0006_m_000015_0,2) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
taskTracker/vagrant/jobcache/job_201111040342_0006/attempt_201111040342_0006_m_000015_0/output/file.out.index
 in any of the configured local directories
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
        at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816)
        at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
child with bad map output: attempt_201111040342_0006_m_000015_0. Ignored.


Are they related? What d any of the mean?

If I use a much smaller amount of data I don't see any of these errors and 
everything works fine, so I guess they are to do with some resource (though 
what I don't know?) Looking at MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE

I see that datanodes have ample disk space, that isn't it...

Any help at all really appreciated. Searching for the errors on Google has me 
nothing, reading the Hadoop definitive guide as me nothing.

Many thanks in advance

Russell

Re: Never ending reduce jobs, error Error reading task outputConnection refused

Reply via email to