Hi Gaurav, Does this error always happen?? Our settings are similar. Mine contains some error messages about IOExceptions, not able to obtain certain blocks, not able to create a new block. Although the program hung some time, in most cases, they were able to complete with correct results. Btw, I am running the grep sample program on version 0.11.2.
Best Regards Richard Yang [EMAIL PROTECTED] [EMAIL PROTECTED] -----Original Message----- From: Gaurav Agarwal [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 07, 2007 12:22 AM To: [email protected] Subject: Hadoop 'wordcount' program hanging in the Reduce phase. Hi Everyone! I am new user to Hadoop and trying to set up a small cluster using Hadoop. but I am facing some issues doing that. I am trying to run the Hadoop 'wordcount' example program which come bundled with it. I am able to successfully run the program on a single node cluster (that is using my local machine only). But, when I try to run the same program on a cluster of two machines, the program hangs in the 'reduce' phase. Settings: Master Node: 192.168.1.150 (dennis-laptop) Slave Node: 192.168.1.201 (traal) User Account on both Master and Slave is named : Hadoop Password-less ssh login to Slave from the Master is working. JAVA_HOME is set appropriately in the hadoop-env.sh file on both Master/Slave. MASTER 1) conf/slaves localhost [EMAIL PROTECTED] 2) conf/master localhost 3) conf/hadoop-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>192.168.1.150:50000</value> </property> <property> <name>mapred.job.tracker</name> <value>192.168.1.150:50001</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> SLAVE 1) conf/slaves localhost 2) conf/master [EMAIL PROTECTED] 3) conf/hadoop-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>192.168.1.150:50000</value> </property> <property> <name>mapred.job.tracker</name> <value>192.168.1.150:50001</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> CONSOLE OUTPUT bin/hadoop jar hadoop-*-examples.jar wordcount -m 10 -r 2 input output 07/03/06 23:17:17 INFO mapred.InputFormatBase: Total input paths to process : 1 07/03/06 23:17:18 INFO mapred.JobClient: Running job: job_0001 07/03/06 23:17:19 INFO mapred.JobClient: map 0% reduce 0% 07/03/06 23:17:29 INFO mapred.JobClient: map 20% reduce 0% 07/03/06 23:17:30 INFO mapred.JobClient: map 40% reduce 0% 07/03/06 23:17:32 INFO mapred.JobClient: map 80% reduce 0% 07/03/06 23:17:33 INFO mapred.JobClient: map 100% reduce 0% 07/03/06 23:17:42 INFO mapred.JobClient: map 100% reduce 3% 07/03/06 23:17:43 INFO mapred.JobClient: map 100% reduce 5% 07/03/06 23:17:44 INFO mapred.JobClient: map 100% reduce 8% 07/03/06 23:17:52 INFO mapred.JobClient: map 100% reduce 10% 07/03/06 23:17:53 INFO mapred.JobClient: map 100% reduce 13% 07/03/06 23:18:03 INFO mapred.JobClient: map 100% reduce 16% The only exception I can see from the log files is in the 'TaskTracker' log file: 2007-03-06 23:17:32,214 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000000_0 Copying task_0001_m_000002_0 output from traal. 2007-03-06 23:17:32,221 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000000_0 Copying task_0001_m_000001_0 output from dennis-laptop. 2007-03-06 23:17:32,368 WARN org.apache.hadoop.mapred.TaskRunner: task_0001_r_000000_0 copy failed: task_0001_m_000002_0 from traal 2007-03-06 23:17:32,368 WARN org.apache.hadoop.mapred.TaskRunner: java.io.IOException: File /tmp/hadoop-hadoop/mapred/local/task_0001_r_000000_0/map_2.out-0 not created at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceT askRunner.java:301) at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunn er.java:262) 2007-03-06 23:17:32,369 WARN org.apache.hadoop.mapred.TaskRunner: task_0001_r_000000_0 adding host traal to penalty box, next contact in 99 seconds I am attaching the master log files just in case anyone wants to check them. Any help will be greatly appreciated! -gaurav http://www.nabble.com/file/7013/hadoop-hadoop-tasktracker-dennis-laptop.log hadoop-hadoop-tasktracker-dennis-laptop.log </br> http://www.nabble.com/file/7012/hadoop-hadoop-jobtracker-dennis-laptop.log hadoop-hadoop-jobtracker-dennis-laptop.log </br> http://www.nabble.com/file/7011/hadoop-hadoop-namenode-dennis-laptop.log hadoop-hadoop-namenode-dennis-laptop.log </br> http://www.nabble.com/file/7010/hadoop-hadoop-datanode-dennis-laptop.log hadoop-hadoop-datanode-dennis-laptop.log -- View this message in context: http://www.nabble.com/Hadoop-%27wordcount%27-program-hanging-in-the-Reduce-p hase.-tf3360661.html#a9348424 Sent from the Hadoop Users mailing list archive at Nabble.com.
