I have checked the log and found that for each map task , there are 3 failures which look like machin1(failed) -> machine2(failed) -> machine1(failed) -> machine2(succeeded). All failures are "Too many fetch failures". And i am sure there is no firewall between the two nodes , at least port 50060 can be accessed from web browser.
How can I check whether two nodes can fetch mapper outputs from one another? I have no idea how reducers fetch these data ... Thanks! On Wed, Apr 8, 2009 at 2:21 AM, Aaron Kimball <[email protected]> wrote: > Xiaolin, > > Are you certain that the two nodes can fetch mapper outputs from one > another? If it's taking that long to complete, it might be the case that > what makes it "complete" is just that eventually it abandons one of your > two > nodes and runs everything on a single node where it succeeds -- defeating > the point, of course. > > Might there be a firewall between the two nodes that blocks the port used > by > the reducer to fetch the mapper outputs? (I think this is on 50060 by > default.) > > - Aaron > > On Tue, Apr 7, 2009 at 8:08 AM, xiaolin guo <[email protected]> wrote: > > > This simple map-recude application will take nearly 1 hour to finish > > running > > on the two-node cluster ,due to lots of Failed/Killed task attempts, > while > > in the single node cluster this application only takes 1 minite ... I am > > quite confusing why there are so many Failed/Killed attempts .. > > > > On Tue, Apr 7, 2009 at 10:40 PM, xiaolin guo <[email protected]> wrote: > > > > > I am trying to setup a small hadoop cluster , everything was ok before > I > > > moved from single node cluster to two-node cluster. I followed the > > article > > > > > > http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29> > < > http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 > > > > < > > > http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 > >to > > config master and slaves.However, when I tried to run the example > > > wordcount map-reduce application , the reduce task got stuck in 19% for > a > > > log time . Then I got a notice:"INFO mapred.JobClient: TaskId : > > > attempt_200904072219_0001_m_000002_0, Status : FAILED too many fetch > > > errors" and an error message : Error reading task outputslave. > > > > > > All map tasks in both task nodes had been finished which could be > > verified > > > in task tracker pages. > > > > > > Both nodes work well in single node mode . And the Hadoop file system > > seems > > > to be healthy in multi-node mode. > > > > > > Can anyone help me with this issue? Have already got entangled in this > > > issue for a long time ... > > > > > > Thanks very much! > > > > > >
