Fixed the problem The problem is that the one of the nodes can not resolve the name of the other node. Even if I use ip address in the masters and slaves , hadoop will use the name of the node instead of the ip address ...
On Wed, Apr 8, 2009 at 7:26 PM, xiaolin guo <[email protected]> wrote: > I have checked the log and found that for each map task , there are 3 > failures which look like machin1(failed) -> machine2(failed) -> > machine1(failed) -> machine2(succeeded). All failures are "Too many fetch > failures". And i am sure there is no firewall between the two nodes , at > least port 50060 can be accessed from web browser. > > How can I check whether two nodes can fetch mapper outputs from one > another? I have no idea how reducers fetch these data ... > > Thanks! > > > On Wed, Apr 8, 2009 at 2:21 AM, Aaron Kimball <[email protected]> wrote: > >> Xiaolin, >> >> Are you certain that the two nodes can fetch mapper outputs from one >> another? If it's taking that long to complete, it might be the case that >> what makes it "complete" is just that eventually it abandons one of your >> two >> nodes and runs everything on a single node where it succeeds -- defeating >> the point, of course. >> >> Might there be a firewall between the two nodes that blocks the port used >> by >> the reducer to fetch the mapper outputs? (I think this is on 50060 by >> default.) >> >> - Aaron >> >> On Tue, Apr 7, 2009 at 8:08 AM, xiaolin guo <[email protected]> wrote: >> >> > This simple map-recude application will take nearly 1 hour to finish >> > running >> > on the two-node cluster ,due to lots of Failed/Killed task attempts, >> while >> > in the single node cluster this application only takes 1 minite ... I am >> > quite confusing why there are so many Failed/Killed attempts .. >> > >> > On Tue, Apr 7, 2009 at 10:40 PM, xiaolin guo <[email protected]> wrote: >> > >> > > I am trying to setup a small hadoop cluster , everything was ok before >> I >> > > moved from single node cluster to two-node cluster. I followed the >> > article >> > > >> > >> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29> >> < >> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 >> > >> > < >> > >> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 >> >to >> > config master and slaves.However, when I tried to run the example >> > > wordcount map-reduce application , the reduce task got stuck in 19% >> for a >> > > log time . Then I got a notice:"INFO mapred.JobClient: TaskId : >> > > attempt_200904072219_0001_m_000002_0, Status : FAILED too many fetch >> > > errors" and an error message : Error reading task outputslave. >> > > >> > > All map tasks in both task nodes had been finished which could be >> > verified >> > > in task tracker pages. >> > > >> > > Both nodes work well in single node mode . And the Hadoop file system >> > seems >> > > to be healthy in multi-node mode. >> > > >> > > Can anyone help me with this issue? Have already got entangled in >> this >> > > issue for a long time ... >> > > >> > > Thanks very much! >> > > >> > >> > >
