Could you take a look at the task logs $HADOOP_LOG_DIR/logs/<reduce-task-id>/syslog/part* . That will contain info on what's going wrong. If it is consistently happening, there most likely is some misconfig. Let us know what exceptions, etc. you see there.
> -----Original Message----- > From: Xiaoguang Qi [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 06, 2007 8:51 PM > To: [email protected] > Subject: hadoop hang on reduce > > Hi, all -- > > I was trying to configure hadoop to work on two machines. The > dfs seems to work fine. But when I tried the 'grep' example > in 'hadoop-0.13.1-examples.jar', it always hang upon the > finish of map tasks and the start of reduce tasks. I thought > this could be a network problem; so I reconfigured it to run > on a single machine, but still running in distributed mode. > The problem remains. Here are the configuration files. > > ========== hadoop-site.xml ========== > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > > <property> > <name>fs.default.name</name> > <value>(masked machine name):9000</value> > </property> > > <property> > <name>mapred.job.tracker</name> > <value>(masked machine name):9001</value> > </property> > > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > > <property> > <name>dfs.name.dir</name> > <value>dfs-space/dfs/name</value> > </property> > > <property> > <name>dfs.data.dir</name> > <value>dfs-space/dfs/data</value> > </property> > > <property> > <name>mapred.local.dir</name> > <value>dfs-space/mapred/local</value> > </property> > > </configuration> > > > ========== mapred-default.xml ========== <?xml > version="1.0"?> <?xml-stylesheet type="text/xsl" > href="configuration.xsl"?> > > <!-- Put mapred-specific property overrides in this file. --> > > <configuration> > <property> > <name>mapred.map.tasks</name> > <value>20</value> > </property> > > <property> > <name>mapred.reduce.tasks</name> > <value>1</value> > </property> > </configuration> > > > When I run the following command: > bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' > > here's what the screen shows: > > 07/09/06 23:10:20 INFO mapred.FileInputFormat: Total input > paths to process : 3 > 07/09/06 23:10:20 INFO mapred.JobClient: Running job: job_0001 > 07/09/06 23:10:21 INFO mapred.JobClient: map 0% reduce 0% > 07/09/06 23:10:32 INFO mapred.JobClient: map 4% reduce 0% > 07/09/06 23:10:33 INFO mapred.JobClient: map 13% reduce 0% > 07/09/06 23:10:34 INFO mapred.JobClient: map 18% reduce 0% > 07/09/06 23:10:35 INFO mapred.JobClient: map 22% reduce 0% > 07/09/06 23:10:36 INFO mapred.JobClient: map 27% reduce 0% > 07/09/06 23:10:37 INFO mapred.JobClient: map 36% reduce 0% > 07/09/06 23:10:39 INFO mapred.JobClient: map 45% reduce 0% > 07/09/06 23:10:40 INFO mapred.JobClient: map 49% reduce 0% > 07/09/06 23:10:41 INFO mapred.JobClient: map 54% reduce 0% > 07/09/06 23:10:42 INFO mapred.JobClient: map 59% reduce 0% > 07/09/06 23:10:43 INFO mapred.JobClient: map 68% reduce 0% > 07/09/06 23:10:45 INFO mapred.JobClient: map 77% reduce 0% > 07/09/06 23:10:47 INFO mapred.JobClient: map 86% reduce 0% > 07/09/06 23:10:49 INFO mapred.JobClient: map 95% reduce 0% > 07/09/06 23:10:50 INFO mapred.JobClient: map 100% reduce 0% > > Then the program hang for a long time until I kill it. > Here's what I find in the 'tasktracker' log file: > > ...... > 2007-09-06 22:54:52,569 INFO > org.apache.hadoop.mapred.TaskTracker: LaunchTaskAct > ion: task_0001_m_000021_0 > 2007-09-06 22:54:53,942 INFO > org.apache.hadoop.mapred.TaskTracker: task_0001_m_0 00019_0 > 1.0% hdfs://(masked machine name):9000/user/(masked user > name)/input/hadoop-defau > lt.xml:26068+1018 > 2007-09-06 22:54:53,944 INFO > org.apache.hadoop.mapred.TaskTracker: Task task_000 > 1_m_000019_0 is done. > 2007-09-06 22:54:54,040 INFO > org.apache.hadoop.mapred.TaskTracker: task_0001_m_0 00021_0 > 1.0% hdfs://(masked machine name):9000/user/(masked user > name)/input/hadoop-site. > xml:0+178 > 2007-09-06 22:54:54,043 INFO > org.apache.hadoop.mapred.TaskTracker: Task task_000 > 1_m_000021_0 is done. > 2007-09-06 22:54:54,059 INFO > org.apache.hadoop.mapred.TaskTracker: LaunchTaskAct > ion: task_0001_r_000000_0 > 2007-09-06 22:54:55,935 INFO > org.apache.hadoop.mapred.TaskTracker: task_0001_r_0 00000_0 > 0.0% reduce > copy > > 2007-09-06 22:54:56,939 INFO > org.apache.hadoop.mapred.TaskTracker: task_0001_r_0 00000_0 > 0.0% reduce > copy > > 2007-09-06 22:54:57,942 INFO > org.apache.hadoop.mapred.TaskTracker: task_0001_r_0 00000_0 > 0.0% reduce > copy > > 2007-09-06 22:54:58,947 INFO > org.apache.hadoop.mapred.TaskTracker: task_0001_r_0 00000_0 > 0.0% reduce > copy > ...... > > The last line repeats until the end of log file. > > Any one have an idea what the problem is? Any suggestion is > appreciated! >
