Done so, working, Awesome and many many thanks! Cheers
Russell On 4 Nov 2011, at 16:06, Uma Maheswara Rao G 72686 wrote: > ----- Original Message ----- > From: Russell Brown <misterr...@gmail.com> > Date: Friday, November 4, 2011 9:18 pm > Subject: Re: Never ending reduce jobs, error Error reading task > outputConnection refused > To: mapreduce-user@hadoop.apache.org > >> >> On 4 Nov 2011, at 15:44, Uma Maheswara Rao G 72686 wrote: >> >>> ----- Original Message ----- >>> From: Russell Brown <misterr...@gmail.com> >>> Date: Friday, November 4, 2011 9:11 pm >>> Subject: Re: Never ending reduce jobs, error Error reading task >> outputConnection refused >>> To: mapreduce-user@hadoop.apache.org >>> >>>> >>>> On 4 Nov 2011, at 15:35, Uma Maheswara Rao G 72686 wrote: >>>> >>>>> This problem may come if you dont configure the hostmappings >>>> properly.> Can you check whether your tasktrackers are pingable >>>> from each other with the configured hosts names? >>>> >>>> >>>> Hi, >>>> Thanks for replying so fast! >>>> >>>> Hostnames? I use IP addresses in the slaves config file, and >> via >>>> IP addresses everyone can ping everyone else, do I need to set >> up >>>> hostnames too? >>> Yes, can you configure hostname mappings and check.. >> >> Like full blown DNS? I mean there is no reference to any machine >> by hostname in any of my config anywhere, so I'm not sure where to >> start. These machines are just on my local network. > you need to configure them in /etc/hosts file. > ex: xx.xx.xx.xx1 TT_HOSTNAME1 > xx.xx.xx.xx2 TT_HOSTNAME2 > xx.xx.xx.xx3 TT_HOSTNAME3 > xx.xx.xx.xx4 TT_HOSTNAME4 > configure them in all the machines and check. >> >>>> >>>> Cheers >>>> >>>> Russell >>>>> >>>>> Regards, >>>>> Uma >>>>> ----- Original Message ----- >>>>> From: Russell Brown <misterr...@gmail.com> >>>>> Date: Friday, November 4, 2011 9:00 pm >>>>> Subject: Never ending reduce jobs, error Error reading task >>>> outputConnection refused >>>>> To: mapreduce-user@hadoop.apache.org >>>>> >>>>>> Hi, >>>>>> I have a cluster of 4 tasktracker/datanodes and 1 >>>>>> JobTracker/Namenode. I can run small jobs on this cluster >> fine >>>>>> (like up to a few thousand keys) but more than that and I >> start >>>>>> seeing errors like this: >>>>>> >>>>>> >>>>>> 11/11/04 08:16:08 INFO mapred.JobClient: Task Id : >>>>>> attempt_201111040342_0006_m_000005_0, Status : FAILED >>>>>> Too many fetch-failures >>>>>> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task >>>>>> outputConnection refused >>>>>> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task >>>>>> outputConnection refused >>>>>> 11/11/04 08:16:13 INFO mapred.JobClient: map 97% reduce 1% >>>>>> 11/11/04 08:16:25 INFO mapred.JobClient: map 100% reduce 1% >>>>>> 11/11/04 08:17:20 INFO mapred.JobClient: Task Id : >>>>>> attempt_201111040342_0006_m_000010_0, Status : FAILED >>>>>> Too many fetch-failures >>>>>> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task >>>>>> outputConnection refused >>>>>> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task >>>>>> outputConnection refused >>>>>> 11/11/04 08:17:24 INFO mapred.JobClient: map 97% reduce 1% >>>>>> 11/11/04 08:17:36 INFO mapred.JobClient: map 100% reduce 1% >>>>>> 11/11/04 08:19:20 INFO mapred.JobClient: Task Id : >>>>>> attempt_201111040342_0006_m_000011_0, Status : FAILED >>>>>> Too many fetch-failures >>>>>> >>>>>> >>>>>> >>>>>> I have no IDEA what this means. All my nodes can ssh to each >>>>>> other, pass wordlessly, all the time. >>>>>> >>>>>> On the individual data/task nodes the logs have errors like this: >>>>>> >>>>>> 2011-11-04 08:24:42,514 WARN >>>> org.apache.hadoop.mapred.TaskTracker: >>>>>> getMapOutput(attempt_201111040342_0006_m_000015_0,2) failed : >>>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could >>>> not >>>>>> find >>>> >> taskTracker/vagrant/jobcache/job_201111040342_0006/attempt_201111040342_0006_m_000015_0/output/file.out.index >> in any of the configured local directories >>>>>> at >>>>>> >>>> >> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) >> at >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) >>>>>> at >>>>>> >>>> >> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543) >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) >>>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) >>>>>> at >>>>>> >>>> >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) >> at >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) >>>>>> at >>>>>> >>>> >> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816) >> at >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) >>>>>> at >>>>>> >>>> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) >> at >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) >>>>>> at >>>>>> >>>> >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) >> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) >>>>>> at >>>>>> >>>> >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) >> at >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) >>>>>> at >>>>>> >>>> >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) >> at org.mortbay.jetty.Server.handle(Server.java:326) >>>>>> at >>>>>> >>>> >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) >> at >> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) >>>>>> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) >>>>>> at >>>> >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)>> >>>> at >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)>>>> >> at >>>>>> >>>> >> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) >> at >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) >>>>>> >>>>>> 2011-11-04 08:24:42,514 WARN >>>> org.apache.hadoop.mapred.TaskTracker: >>>>>> Unknown child with bad map output: >>>>>> attempt_201111040342_0006_m_000015_0. Ignored. >>>>>> >>>>>> >>>>>> Are they related? What d any of the mean? >>>>>> >>>>>> If I use a much smaller amount of data I don't see any of >> these >>>>>> errors and everything works fine, so I guess they are to do >>>> with >>>>>> some resource (though what I don't know?) Looking at >>>>>> MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE >>>>>> I see that datanodes have ample disk space, that isn't it… >>>>>> >>>>>> Any help at all really appreciated. Searching for the errors >> on >>>>>> Google has me nothing, reading the Hadoop definitive guide as >>>> me >>>>>> nothing. >>>>>> Many thanks in advance >>>>>> >>>>>> Russell >>>> >>>> >>> Regards, >>> Uma >> >>