Thanks for the reply. The user name is created properly - 'hadoop' . That is not the problem as the jobtracker is able to start tasks on the slave machine.
I tried to play around and observed that if I make the remote machine as the only slave (as opposed to the master node acting as one of the slaves too), the tasks run fine. It could be that making a node function as both slave and master is a bad idea (although I do not see any reason for not doing so). I will try to get access to more slave machines and see if my guess is correct. thanks, gaurav jaylac wrote: > > > Hi gaurav > > Im also a beginner.... Still i try to tell my views... This may not be > correct... > > You said u've created user name called "Hadoop" on both system.. But in > slave file u've witten as [EMAIL PROTECTED] Is it not case > sensitive? So try changing it to [EMAIL PROTECTED] on both the > systems... > > Also use the port 9010 and 9011 in the hadoop-site.html file... > > But these might be no way related to ur problem.... Still try these and > let me know. > > Regards, > Jaya > > > Gaurav Agarwal wrote: >> >> Hi Everyone! >> I am new user to Hadoop and trying to set up a small cluster using Hadoop >> (Release Mar 02) on Ubuntu 6.10 (Edgy) ; but I am facing some issues >> doing that. >> >> I am trying to run the Hadoop 'wordcount' example program which come >> bundled with it. I am able to successfully run the program on a single >> node cluster (that is using my local machine only). But, when I try to >> run the same program on a cluster of two machines, the program hangs in >> the 'reduce' phase. >> >> >> Settings: >> >> Master Node: 192.168.1.150 (dennis-laptop) >> Slave Node: 192.168.1.201 (traal) >> >> User Account on both Master and Slave is named : Hadoop >> >> Password-less ssh login to Slave from the Master is working. >> >> JAVA_HOME is set appropriately in the hadoop-env.sh file on both >> Master/Slave. >> >> MASTER >> >> 1) conf/slaves >> localhost >> [EMAIL PROTECTED] >> >> 2) conf/master >> localhost >> >> 3) conf/hadoop-site.xml >> <?xml version="1.0"?> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> >> <!-- Put site-specific property overrides in this file. --> >> >> <configuration> >> <property> >> <name>fs.default.name</name> >> <value>192.168.1.150:50000</value> >> </property> >> >> <property> >> <name>mapred.job.tracker</name> >> <value>192.168.1.150:50001</value> >> </property> >> >> <property> >> <name>dfs.replication</name> >> <value>2</value> >> </property> >> </configuration> >> >> SLAVE >> >> 1) conf/slaves >> localhost >> >> 2) conf/master >> [EMAIL PROTECTED] >> >> 3) conf/hadoop-site.xml >> <?xml version="1.0"?> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> >> <!-- Put site-specific property overrides in this file. --> >> >> <configuration> >> <property> >> <name>fs.default.name</name> >> <value>192.168.1.150:50000</value> >> </property> >> >> <property> >> <name>mapred.job.tracker</name> >> <value>192.168.1.150:50001</value> >> </property> >> >> <property> >> <name>dfs.replication</name> >> <value>2</value> >> </property> >> </configuration> >> >> >> CONSOLE OUTPUT >> bin/hadoop jar hadoop-*-examples.jar wordcount -m 10 -r 2 input output >> 07/03/06 23:17:17 INFO mapred.InputFormatBase: Total input paths to >> process : 1 >> 07/03/06 23:17:18 INFO mapred.JobClient: Running job: job_0001 >> 07/03/06 23:17:19 INFO mapred.JobClient: map 0% reduce 0% >> 07/03/06 23:17:29 INFO mapred.JobClient: map 20% reduce 0% >> 07/03/06 23:17:30 INFO mapred.JobClient: map 40% reduce 0% >> 07/03/06 23:17:32 INFO mapred.JobClient: map 80% reduce 0% >> 07/03/06 23:17:33 INFO mapred.JobClient: map 100% reduce 0% >> 07/03/06 23:17:42 INFO mapred.JobClient: map 100% reduce 3% >> 07/03/06 23:17:43 INFO mapred.JobClient: map 100% reduce 5% >> 07/03/06 23:17:44 INFO mapred.JobClient: map 100% reduce 8% >> 07/03/06 23:17:52 INFO mapred.JobClient: map 100% reduce 10% >> 07/03/06 23:17:53 INFO mapred.JobClient: map 100% reduce 13% >> 07/03/06 23:18:03 INFO mapred.JobClient: map 100% reduce 16% >> >> >> The only exception I can see from the log files is in the 'TaskTracker' >> log file: >> >> 2007-03-06 23:17:32,214 INFO org.apache.hadoop.mapred.TaskRunner: >> task_0001_r_000000_0 Copying task_0001_m_000002_0 output from traal. >> 2007-03-06 23:17:32,221 INFO org.apache.hadoop.mapred.TaskRunner: >> task_0001_r_000000_0 Copying task_0001_m_000001_0 output from >> dennis-laptop. >> 2007-03-06 23:17:32,368 WARN org.apache.hadoop.mapred.TaskRunner: >> task_0001_r_000000_0 copy failed: task_0001_m_000002_0 from traal >> 2007-03-06 23:17:32,368 WARN org.apache.hadoop.mapred.TaskRunner: >> java.io.IOException: File >> /tmp/hadoop-hadoop/mapred/local/task_0001_r_000000_0/map_2.out-0 not >> created >> at >> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:301) >> at >> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:262) >> >> 2007-03-06 23:17:32,369 WARN org.apache.hadoop.mapred.TaskRunner: >> task_0001_r_000000_0 adding host traal to penalty box, next contact in 99 >> seconds >> >> I am attaching the master log files just in case anyone wants to check >> them. >> >> Any help will be greatly appreciated! >> >> -gaurav >> >> >> http://www.nabble.com/file/7013/hadoop-hadoop-tasktracker-dennis-laptop.log >> hadoop-hadoop-tasktracker-dennis-laptop.log </br> >> http://www.nabble.com/file/7012/hadoop-hadoop-jobtracker-dennis-laptop.log >> hadoop-hadoop-jobtracker-dennis-laptop.log </br> >> http://www.nabble.com/file/7011/hadoop-hadoop-namenode-dennis-laptop.log >> hadoop-hadoop-namenode-dennis-laptop.log </br> >> http://www.nabble.com/file/7010/hadoop-hadoop-datanode-dennis-laptop.log >> hadoop-hadoop-datanode-dennis-laptop.log >> > > -- View this message in context: http://www.nabble.com/Hadoop-%27wordcount%27-program-hanging-in-the-Reduce-phase.-tf3360661.html#a9362369 Sent from the Hadoop Users mailing list archive at Nabble.com.
