Re: Never ending reduce jobs, error Error reading task outputConnection refused

Russell Brown Tue, 08 Nov 2011 06:27:39 -0800

On 8 Nov 2011, at 03:35, Sudharsan Sampath wrote:

> Hi,
> 
> Also, please make it a point to use only hostnames in your configuration 
> also. Hadoop works entirely on hostname configurations.
Right, and now I know. Thanks again. This was weird in that it worked for small 
amounts of data, but not for large. Equally weird is the absence of any clue in 
the stack traces. You live an learn, eh?


> 
> Thanks
> Sudhan S
> 
> On Fri, Nov 4, 2011 at 9:39 PM, Russell Brown <misterr...@gmail.com> wrote:
> Done so, working, Awesome and many many thanks!
> 
> Cheers
> 
> Russell
> On 4 Nov 2011, at 16:06, Uma Maheswara Rao G 72686 wrote:
> 
> > ----- Original Message -----
> > From: Russell Brown <misterr...@gmail.com>
> > Date: Friday, November 4, 2011 9:18 pm
> > Subject: Re: Never ending reduce jobs, error Error reading task 
> > outputConnection refused
> > To: mapreduce-user@hadoop.apache.org
> >
> >>
> >> On 4 Nov 2011, at 15:44, Uma Maheswara Rao G 72686 wrote:
> >>
> >>> ----- Original Message -----
> >>> From: Russell Brown <misterr...@gmail.com>
> >>> Date: Friday, November 4, 2011 9:11 pm
> >>> Subject: Re: Never ending reduce jobs, error Error reading task
> >> outputConnection refused
> >>> To: mapreduce-user@hadoop.apache.org
> >>>
> >>>>
> >>>> On 4 Nov 2011, at 15:35, Uma Maheswara Rao G 72686 wrote:
> >>>>
> >>>>> This problem may come if you dont configure the hostmappings
> >>>> properly.> Can you check whether your tasktrackers are pingable
> >>>> from each other with the configured hosts names?
> >>>>
> >>>>
> >>>> Hi,
> >>>> Thanks for replying so fast!
> >>>>
> >>>> Hostnames? I use IP addresses in the slaves config file, and
> >> via
> >>>> IP addresses everyone can ping everyone else, do I need to set
> >> up
> >>>> hostnames too?
> >>> Yes, can you configure hostname mappings and check..
> >>
> >> Like full blown DNS? I mean there is no reference to any machine
> >> by hostname in any of my config anywhere, so I'm not sure where to
> >> start. These machines are just on my local network.
> > you need to configure them in /etc/hosts file.
> > ex: xx.xx.xx.xx1 TT_HOSTNAME1
> >    xx.xx.xx.xx2 TT_HOSTNAME2
> >    xx.xx.xx.xx3 TT_HOSTNAME3
> >    xx.xx.xx.xx4 TT_HOSTNAME4
> > configure them in all the machines and check.
> >>
> >>>>
> >>>> Cheers
> >>>>
> >>>> Russell
> >>>>>
> >>>>> Regards,
> >>>>> Uma
> >>>>> ----- Original Message -----
> >>>>> From: Russell Brown <misterr...@gmail.com>
> >>>>> Date: Friday, November 4, 2011 9:00 pm
> >>>>> Subject: Never ending reduce jobs, error Error reading task
> >>>> outputConnection refused
> >>>>> To: mapreduce-user@hadoop.apache.org
> >>>>>
> >>>>>> Hi,
> >>>>>> I have a cluster of 4 tasktracker/datanodes and 1
> >>>>>> JobTracker/Namenode. I can run small jobs on this cluster
> >> fine
> >>>>>> (like up to a few thousand keys) but more than that and I
> >> start
> >>>>>> seeing errors like this:
> >>>>>>
> >>>>>>
> >>>>>> 11/11/04 08:16:08 INFO mapred.JobClient: Task Id :
> >>>>>> attempt_201111040342_0006_m_000005_0, Status : FAILED
> >>>>>> Too many fetch-failures
> >>>>>> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task
> >>>>>> outputConnection refused
> >>>>>> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task
> >>>>>> outputConnection refused
> >>>>>> 11/11/04 08:16:13 INFO mapred.JobClient:  map 97% reduce 1%
> >>>>>> 11/11/04 08:16:25 INFO mapred.JobClient:  map 100% reduce 1%
> >>>>>> 11/11/04 08:17:20 INFO mapred.JobClient: Task Id :
> >>>>>> attempt_201111040342_0006_m_000010_0, Status : FAILED
> >>>>>> Too many fetch-failures
> >>>>>> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task
> >>>>>> outputConnection refused
> >>>>>> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task
> >>>>>> outputConnection refused
> >>>>>> 11/11/04 08:17:24 INFO mapred.JobClient:  map 97% reduce 1%
> >>>>>> 11/11/04 08:17:36 INFO mapred.JobClient:  map 100% reduce 1%
> >>>>>> 11/11/04 08:19:20 INFO mapred.JobClient: Task Id :
> >>>>>> attempt_201111040342_0006_m_000011_0, Status : FAILED
> >>>>>> Too many fetch-failures
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I have no IDEA what this means. All my nodes can ssh to each
> >>>>>> other, pass wordlessly, all the time.
> >>>>>>
> >>>>>> On the individual data/task nodes the logs have errors like this:
> >>>>>>
> >>>>>> 2011-11-04 08:24:42,514 WARN
> >>>> org.apache.hadoop.mapred.TaskTracker:
> >>>>>> getMapOutput(attempt_201111040342_0006_m_000015_0,2) failed :
> >>>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
> >>>> not
> >>>>>> find
> >>>>
> >> taskTracker/vagrant/jobcache/job_201111040342_0006/attempt_201111040342_0006_m_000015_0/output/file.out.index
> >>  in any of the configured local directories
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
> >>     at 
> >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543)
> >>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> >>>>>>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)     
> >>   at 
> >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816)
> >>    at 
> >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)   
> >>   at 
> >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)   
> >>   at 
> >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)      
> >>   at 
> >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)   
> >>   at org.mortbay.jetty.Server.handle(Server.java:326)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)    
> >>   at 
> >> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> >>>>>>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> >>>>>>  at
> >>>>
> >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)>>
> >>>>    at
> >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)>>>>       
> >>   at
> >>>>>>
> >>>>
> >> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> >>  at 
> >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> >>>>>>
> >>>>>> 2011-11-04 08:24:42,514 WARN
> >>>> org.apache.hadoop.mapred.TaskTracker:
> >>>>>> Unknown child with bad map output:
> >>>>>> attempt_201111040342_0006_m_000015_0. Ignored.
> >>>>>>
> >>>>>>
> >>>>>> Are they related? What d any of the mean?
> >>>>>>
> >>>>>> If I use a much smaller amount of data I don't see any of
> >> these
> >>>>>> errors and everything works fine, so I guess they are to do
> >>>> with
> >>>>>> some resource (though what I don't know?) Looking at
> >>>>>> MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE
> >>>>>> I see that datanodes have ample disk space, that isn't it…
> >>>>>>
> >>>>>> Any help at all really appreciated. Searching for the errors
> >> on
> >>>>>> Google has me nothing, reading the Hadoop definitive guide as
> >>>> me
> >>>>>> nothing.
> >>>>>> Many thanks in advance
> >>>>>>
> >>>>>> Russell
> >>>>
> >>>>
> >>> Regards,
> >>> Uma
> >>
> >>
> 
>

Re: Never ending reduce jobs, error Error reading task outputConnection refused

Reply via email to