Re: Too many fetch errors

xiaolin guo Wed, 08 Apr 2009 04:26:34 -0700

I have checked the log and found that  for each map task , there are 3
failures which look like machin1(failed) -> machine2(failed) ->
machine1(failed) -> machine2(succeeded). All failures are "Too many fetch
failures". And i am sure there is no firewall between the two nodes , at
least port 50060 can be accessed from web browser.


How can I check whether two nodes can fetch mapper outputs from one
another?  I have no idea how reducers fetch these data ...

Thanks!

On Wed, Apr 8, 2009 at 2:21 AM, Aaron Kimball <[email protected]> wrote:

> Xiaolin,
>
> Are you certain that the two nodes can fetch mapper outputs from one
> another? If it's taking that long to complete, it might be the case that
> what makes it "complete" is just that eventually it abandons one of your
> two
> nodes and runs everything on a single node where it succeeds -- defeating
> the point, of course.
>
> Might there be a firewall between the two nodes that blocks the port used
> by
> the reducer to fetch the mapper outputs? (I think this is on 50060 by
> default.)
>
> - Aaron
>
> On Tue, Apr 7, 2009 at 8:08 AM, xiaolin guo <[email protected]> wrote:
>
> > This simple map-recude application will take nearly 1 hour to finish
> > running
> > on the two-node cluster ,due to lots of Failed/Killed task attempts,
> while
> > in the single node cluster this application only takes 1 minite ... I am
> > quite confusing why there are so many Failed/Killed attempts ..
> >
> > On Tue, Apr 7, 2009 at 10:40 PM, xiaolin guo <[email protected]> wrote:
> >
> > > I am trying to setup a small hadoop cluster , everything was ok before
> I
> > > moved from single node cluster to two-node cluster. I followed the
> > article
> > >
> >
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29>
> <
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
> >
> > <
> >
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
> >to
> > config master and slaves.However, when I tried to run the example
> > > wordcount map-reduce application , the reduce task got stuck in 19% for
> a
> > > log time . Then I got a notice:"INFO mapred.JobClient: TaskId :
> > > attempt_200904072219_0001_m_000002_0, Status : FAILED too many fetch
> > > errors"  and an error message : Error reading task outputslave.
> > >
> > > All map tasks in both task nodes had been finished which could be
> > verified
> > > in task tracker pages.
> > >
> > > Both nodes work well in single node mode . And the Hadoop file system
> > seems
> > > to be healthy in multi-node mode.
> > >
> > > Can anyone help me with this issue?  Have already got entangled in this
> > > issue for a long time ...
> > >
> > > Thanks very much!
> > >
> >
>

Re: Too many fetch errors

Reply via email to