Re: Never ending reduce jobs, error Error reading task outputConnection refused

Sudharsan Sampath Tue, 08 Nov 2011 03:35:49 -0800

Hi,

Also, please make it a point to use only hostnames in your configuration
also. Hadoop works entirely on hostname configurations.


Thanks
Sudhan S

On Fri, Nov 4, 2011 at 9:39 PM, Russell Brown <misterr...@gmail.com> wrote:

> Done so, working, Awesome and many many thanks!
>
> Cheers
>
> Russell
> On 4 Nov 2011, at 16:06, Uma Maheswara Rao G 72686 wrote:
>
> > ----- Original Message -----
> > From: Russell Brown <misterr...@gmail.com>
> > Date: Friday, November 4, 2011 9:18 pm
> > Subject: Re: Never ending reduce jobs, error Error reading task
> outputConnection refused
> > To: mapreduce-user@hadoop.apache.org
> >
> >>
> >> On 4 Nov 2011, at 15:44, Uma Maheswara Rao G 72686 wrote:
> >>
> >>> ----- Original Message -----
> >>> From: Russell Brown <misterr...@gmail.com>
> >>> Date: Friday, November 4, 2011 9:11 pm
> >>> Subject: Re: Never ending reduce jobs, error Error reading task
> >> outputConnection refused
> >>> To: mapreduce-user@hadoop.apache.org
> >>>
> >>>>
> >>>> On 4 Nov 2011, at 15:35, Uma Maheswara Rao G 72686 wrote:
> >>>>
> >>>>> This problem may come if you dont configure the hostmappings
> >>>> properly.> Can you check whether your tasktrackers are pingable
> >>>> from each other with the configured hosts names?
> >>>>
> >>>>
> >>>> Hi,
> >>>> Thanks for replying so fast!
> >>>>
> >>>> Hostnames? I use IP addresses in the slaves config file, and
> >> via
> >>>> IP addresses everyone can ping everyone else, do I need to set
> >> up
> >>>> hostnames too?
> >>> Yes, can you configure hostname mappings and check..
> >>
> >> Like full blown DNS? I mean there is no reference to any machine
> >> by hostname in any of my config anywhere, so I'm not sure where to
> >> start. These machines are just on my local network.
> > you need to configure them in /etc/hosts file.
> > ex: xx.xx.xx.xx1 TT_HOSTNAME1
> >    xx.xx.xx.xx2 TT_HOSTNAME2
> >    xx.xx.xx.xx3 TT_HOSTNAME3
> >    xx.xx.xx.xx4 TT_HOSTNAME4
> > configure them in all the machines and check.
> >>
> >>>>
> >>>> Cheers
> >>>>
> >>>> Russell
> >>>>>
> >>>>> Regards,
> >>>>> Uma
> >>>>> ----- Original Message -----
> >>>>> From: Russell Brown <misterr...@gmail.com>
> >>>>> Date: Friday, November 4, 2011 9:00 pm
> >>>>> Subject: Never ending reduce jobs, error Error reading task
> >>>> outputConnection refused
> >>>>> To: mapreduce-user@hadoop.apache.org
> >>>>>
> >>>>>> Hi,
> >>>>>> I have a cluster of 4 tasktracker/datanodes and 1
> >>>>>> JobTracker/Namenode. I can run small jobs on this cluster
> >> fine
> >>>>>> (like up to a few thousand keys) but more than that and I
> >> start
> >>>>>> seeing errors like this:
> >>>>>>
> >>>>>>
> >>>>>> 11/11/04 08:16:08 INFO mapred.JobClient: Task Id :
> >>>>>> attempt_201111040342_0006_m_000005_0, Status : FAILED
> >>>>>> Too many fetch-failures
> >>>>>> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task
> >>>>>> outputConnection refused
> >>>>>> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task
> >>>>>> outputConnection refused
> >>>>>> 11/11/04 08:16:13 INFO mapred.JobClient:  map 97% reduce 1%
> >>>>>> 11/11/04 08:16:25 INFO mapred.JobClient:  map 100% reduce 1%
> >>>>>> 11/11/04 08:17:20 INFO mapred.JobClient: Task Id :
> >>>>>> attempt_201111040342_0006_m_000010_0, Status : FAILED
> >>>>>> Too many fetch-failures
> >>>>>> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task
> >>>>>> outputConnection refused
> >>>>>> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task
> >>>>>> outputConnection refused
> >>>>>> 11/11/04 08:17:24 INFO mapred.JobClient:  map 97% reduce 1%
> >>>>>> 11/11/04 08:17:36 INFO mapred.JobClient:  map 100% reduce 1%
> >>>>>> 11/11/04 08:19:20 INFO mapred.JobClient: Task Id :
> >>>>>> attempt_201111040342_0006_m_000011_0, Status : FAILED
> >>>>>> Too many fetch-failures
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I have no IDEA what this means. All my nodes can ssh to each
> >>>>>> other, pass wordlessly, all the time.
> >>>>>>
> >>>>>> On the individual data/task nodes the logs have errors like this:
> >>>>>>
> >>>>>> 2011-11-04 08:24:42,514 WARN
> >>>> org.apache.hadoop.mapred.TaskTracker:
> >>>>>> getMapOutput(attempt_201111040342_0006_m_000015_0,2) failed :
> >>>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
> >>>> not
> >>>>>> find
> >>>>
> >>
> taskTracker/vagrant/jobcache/job_201111040342_0006/attempt_201111040342_0006_m_000015_0/output/file.out.index
> in any of the configured local directories
> >>>>>>  at
> >>>>>>
> >>>>
> >>
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
>    at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
> >>>>>>  at
> >>>>>>
> >>>>
> >>
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> >>>>>>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>       at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> >>>>>>  at
> >>>>>>
> >>>>
> >>
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816)
>   at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>>>>>  at
> >>>>>>
> >>>>
> >>
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >>>>>>  at
> >>>>>>
> >>>>
> >>
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>      at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >>>>>>  at
> >>>>>>
> >>>>
> >>
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
> >>>>>>  at
> >>>>>>
> >>>>
> >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>      at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> >>>>>>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> >>>>>>  at
> >>>>
> >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)>>
> >>>>    at
> >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)>>>>
>       at
> >>>>>>
> >>>>
> >>
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> >>>>>>
> >>>>>> 2011-11-04 08:24:42,514 WARN
> >>>> org.apache.hadoop.mapred.TaskTracker:
> >>>>>> Unknown child with bad map output:
> >>>>>> attempt_201111040342_0006_m_000015_0. Ignored.
> >>>>>>
> >>>>>>
> >>>>>> Are they related? What d any of the mean?
> >>>>>>
> >>>>>> If I use a much smaller amount of data I don't see any of
> >> these
> >>>>>> errors and everything works fine, so I guess they are to do
> >>>> with
> >>>>>> some resource (though what I don't know?) Looking at
> >>>>>> MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE
> >>>>>> I see that datanodes have ample disk space, that isn't it…
> >>>>>>
> >>>>>> Any help at all really appreciated. Searching for the errors
> >> on
> >>>>>> Google has me nothing, reading the Hadoop definitive guide as
> >>>> me
> >>>>>> nothing.
> >>>>>> Many thanks in advance
> >>>>>>
> >>>>>> Russell
> >>>>
> >>>>
> >>> Regards,
> >>> Uma
> >>
> >>
>
>

Re: Never ending reduce jobs, error Error reading task outputConnection refused

Reply via email to