Fixed the problem
The problem is that the one of the nodes can not resolve the name of the
other node.  Even if I use ip address in the masters and slaves , hadoop
will use the name of the node instead of the ip address ...

On Wed, Apr 8, 2009 at 7:26 PM, xiaolin guo <[email protected]> wrote:

> I have checked the log and found that  for each map task , there are 3
> failures which look like machin1(failed) -> machine2(failed) ->
> machine1(failed) -> machine2(succeeded). All failures are "Too many fetch
> failures". And i am sure there is no firewall between the two nodes , at
> least port 50060 can be accessed from web browser.
>
> How can I check whether two nodes can fetch mapper outputs from one
> another?  I have no idea how reducers fetch these data ...
>
> Thanks!
>
>
> On Wed, Apr 8, 2009 at 2:21 AM, Aaron Kimball <[email protected]> wrote:
>
>> Xiaolin,
>>
>> Are you certain that the two nodes can fetch mapper outputs from one
>> another? If it's taking that long to complete, it might be the case that
>> what makes it "complete" is just that eventually it abandons one of your
>> two
>> nodes and runs everything on a single node where it succeeds -- defeating
>> the point, of course.
>>
>> Might there be a firewall between the two nodes that blocks the port used
>> by
>> the reducer to fetch the mapper outputs? (I think this is on 50060 by
>> default.)
>>
>> - Aaron
>>
>> On Tue, Apr 7, 2009 at 8:08 AM, xiaolin guo <[email protected]> wrote:
>>
>> > This simple map-recude application will take nearly 1 hour to finish
>> > running
>> > on the two-node cluster ,due to lots of Failed/Killed task attempts,
>> while
>> > in the single node cluster this application only takes 1 minite ... I am
>> > quite confusing why there are so many Failed/Killed attempts ..
>> >
>> > On Tue, Apr 7, 2009 at 10:40 PM, xiaolin guo <[email protected]> wrote:
>> >
>> > > I am trying to setup a small hadoop cluster , everything was ok before
>> I
>> > > moved from single node cluster to two-node cluster. I followed the
>> > article
>> > >
>> >
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29>
>> <
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>> >
>> > <
>> >
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>> >to
>> > config master and slaves.However, when I tried to run the example
>> > > wordcount map-reduce application , the reduce task got stuck in 19%
>> for a
>> > > log time . Then I got a notice:"INFO mapred.JobClient: TaskId :
>> > > attempt_200904072219_0001_m_000002_0, Status : FAILED too many fetch
>> > > errors"  and an error message : Error reading task outputslave.
>> > >
>> > > All map tasks in both task nodes had been finished which could be
>> > verified
>> > > in task tracker pages.
>> > >
>> > > Both nodes work well in single node mode . And the Hadoop file system
>> > seems
>> > > to be healthy in multi-node mode.
>> > >
>> > > Can anyone help me with this issue?  Have already got entangled in
>> this
>> > > issue for a long time ...
>> > >
>> > > Thanks very much!
>> > >
>> >
>>
>
>

Reply via email to