On 4 Nov 2011, at 15:35, Uma Maheswara Rao G 72686 wrote: > This problem may come if you dont configure the hostmappings properly. > Can you check whether your tasktrackers are pingable from each other with the > configured hosts names?
Hi, Thanks for replying so fast! Hostnames? I use IP addresses in the slaves config file, and via IP addresses everyone can ping everyone else, do I need to set up hostnames too? Cheers Russell > > Regards, > Uma > ----- Original Message ----- > From: Russell Brown <misterr...@gmail.com> > Date: Friday, November 4, 2011 9:00 pm > Subject: Never ending reduce jobs, error Error reading task outputConnection > refused > To: mapreduce-user@hadoop.apache.org > >> Hi, >> I have a cluster of 4 tasktracker/datanodes and 1 >> JobTracker/Namenode. I can run small jobs on this cluster fine >> (like up to a few thousand keys) but more than that and I start >> seeing errors like this: >> >> >> 11/11/04 08:16:08 INFO mapred.JobClient: Task Id : >> attempt_201111040342_0006_m_000005_0, Status : FAILED >> Too many fetch-failures >> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task >> outputConnection refused >> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task >> outputConnection refused >> 11/11/04 08:16:13 INFO mapred.JobClient: map 97% reduce 1% >> 11/11/04 08:16:25 INFO mapred.JobClient: map 100% reduce 1% >> 11/11/04 08:17:20 INFO mapred.JobClient: Task Id : >> attempt_201111040342_0006_m_000010_0, Status : FAILED >> Too many fetch-failures >> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task >> outputConnection refused >> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task >> outputConnection refused >> 11/11/04 08:17:24 INFO mapred.JobClient: map 97% reduce 1% >> 11/11/04 08:17:36 INFO mapred.JobClient: map 100% reduce 1% >> 11/11/04 08:19:20 INFO mapred.JobClient: Task Id : >> attempt_201111040342_0006_m_000011_0, Status : FAILED >> Too many fetch-failures >> >> >> >> I have no IDEA what this means. All my nodes can ssh to each >> other, pass wordlessly, all the time. >> >> On the individual data/task nodes the logs have errors like this: >> >> 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: >> getMapOutput(attempt_201111040342_0006_m_000015_0,2) failed : >> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not >> find >> taskTracker/vagrant/jobcache/job_201111040342_0006/attempt_201111040342_0006_m_000015_0/output/file.out.index >> in any of the configured local directories >> at >> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) >> at >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) >> at >> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543) >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) >> at >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) >> at >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) >> at >> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816) >> at >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) >> at >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) >> at >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) >> at >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) >> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) >> at >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) >> at >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) >> at >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) >> at org.mortbay.jetty.Server.handle(Server.java:326) >> at >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) >> at >> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) >> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) >> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) >> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) >> at >> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) >> at >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) >> >> 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: >> Unknown child with bad map output: >> attempt_201111040342_0006_m_000015_0. Ignored. >> >> >> Are they related? What d any of the mean? >> >> If I use a much smaller amount of data I don't see any of these >> errors and everything works fine, so I guess they are to do with >> some resource (though what I don't know?) Looking at >> MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE >> I see that datanodes have ample disk space, that isn't it… >> >> Any help at all really appreciated. Searching for the errors on >> Google has me nothing, reading the Hadoop definitive guide as me >> nothing. >> Many thanks in advance >> >> Russell