Thanks for your reply!

I looked at the log file as you suggested. Here's the error I found:

2007-09-11 14:28:07,451 INFO org.apache.hadoop.mapred.ReduceTask:
task_0002_r_000000_0 Got 3 known map output location(s); scheduling...
2007-09-11 14:28:07,452 INFO org.apache.hadoop.mapred.ReduceTask:
task_0002_r_000000_0 Copying task_0002_m_000001_0 output from
(*machine name*).
2007-09-11 14:28:07,475 WARN org.apache.hadoop.mapred.ReduceTask:
task_0002_r_000000_0 copy failed: task_0002_m_000001_0 from (*machine
name*)
2007-09-11 14:28:07,477 WARN org.apache.hadoop.mapred.ReduceTask:
java.io.IOException: Server returned HTTP response code: 500 for URL:
http://(*machine
name*):50060/mapOutput?map=task_0002_m_000001_0&reduce=0
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1174)
        at 
org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:206)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
Output(ReduceTask.java:680)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
ReduceTask.java:641)

2007-09-11 14:28:07,480 INFO org.apache.hadoop.mapred.ReduceTask:
task_0002_r_000000_0 Scheduled 1 of 3 known outputs (0 slow hosts and
2 dup hosts)
2007-09-11 14:28:07,480 WARN org.apache.hadoop.mapred.ReduceTask:
task_0002_r_000000_0 adding host (*machine name*) to penalty box, next
contact in 278 seconds
2007-09-11 14:28:07,480 INFO org.apache.hadoop.mapred.ReduceTask:
task_0002_r_000000_0 Need 3 map output(s)




On 9/10/07, Devaraj Das <[EMAIL PROTECTED]> wrote:
> Could you take a look at the task logs
> $HADOOP_LOG_DIR/logs/<reduce-task-id>/syslog/part* . That will contain info
> on what's going wrong. If it is consistently happening, there most likely is
> some misconfig. Let us know what exceptions, etc. you see there.
>
> > -----Original Message-----
> > From: Xiaoguang Qi [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, September 06, 2007 8:51 PM
> > To: [email protected]
> > Subject: hadoop hang on reduce
> >
> > Hi, all --
> >
> > I was trying to configure hadoop to work on two machines. The
> > dfs seems to work fine. But when I tried the 'grep' example
> > in 'hadoop-0.13.1-examples.jar', it always hang upon the
> > finish of map tasks and the start of reduce tasks. I thought
> > this could be a network problem; so I reconfigured it to run
> > on a single machine, but still running in distributed mode.
> > The problem remains. Here are the configuration files.
> >
> > ========== hadoop-site.xml ==========
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >
> > <!-- Put site-specific property overrides in this file. -->
> >
> > <configuration>
> >
> >   <property>
> >     <name>fs.default.name</name>
> >     <value>(masked machine name):9000</value>
> >   </property>
> >
> >   <property>
> >     <name>mapred.job.tracker</name>
> >     <value>(masked machine name):9001</value>
> >   </property>
> >
> >   <property>
> >     <name>dfs.replication</name>
> >     <value>1</value>
> >   </property>
> >
> >   <property>
> >     <name>dfs.name.dir</name>
> >     <value>dfs-space/dfs/name</value>
> >   </property>
> >
> >   <property>
> >     <name>dfs.data.dir</name>
> >     <value>dfs-space/dfs/data</value>
> >   </property>
> >
> >   <property>
> >     <name>mapred.local.dir</name>
> >     <value>dfs-space/mapred/local</value>
> >   </property>
> >
> > </configuration>
> >
> >
> > ========== mapred-default.xml ========== <?xml
> > version="1.0"?> <?xml-stylesheet type="text/xsl"
> > href="configuration.xsl"?>
> >
> > <!-- Put mapred-specific property overrides in this file. -->
> >
> > <configuration>
> >   <property>
> >     <name>mapred.map.tasks</name>
> >     <value>20</value>
> >   </property>
> >
> >   <property>
> >     <name>mapred.reduce.tasks</name>
> >     <value>1</value>
> >   </property>
> > </configuration>
> >
> >
> > When I run the following command:
> > bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
> >
> > here's what the screen shows:
> >
> > 07/09/06 23:10:20 INFO mapred.FileInputFormat: Total input
> > paths to process : 3
> > 07/09/06 23:10:20 INFO mapred.JobClient: Running job: job_0001
> > 07/09/06 23:10:21 INFO mapred.JobClient:  map 0% reduce 0%
> > 07/09/06 23:10:32 INFO mapred.JobClient:  map 4% reduce 0%
> > 07/09/06 23:10:33 INFO mapred.JobClient:  map 13% reduce 0%
> > 07/09/06 23:10:34 INFO mapred.JobClient:  map 18% reduce 0%
> > 07/09/06 23:10:35 INFO mapred.JobClient:  map 22% reduce 0%
> > 07/09/06 23:10:36 INFO mapred.JobClient:  map 27% reduce 0%
> > 07/09/06 23:10:37 INFO mapred.JobClient:  map 36% reduce 0%
> > 07/09/06 23:10:39 INFO mapred.JobClient:  map 45% reduce 0%
> > 07/09/06 23:10:40 INFO mapred.JobClient:  map 49% reduce 0%
> > 07/09/06 23:10:41 INFO mapred.JobClient:  map 54% reduce 0%
> > 07/09/06 23:10:42 INFO mapred.JobClient:  map 59% reduce 0%
> > 07/09/06 23:10:43 INFO mapred.JobClient:  map 68% reduce 0%
> > 07/09/06 23:10:45 INFO mapred.JobClient:  map 77% reduce 0%
> > 07/09/06 23:10:47 INFO mapred.JobClient:  map 86% reduce 0%
> > 07/09/06 23:10:49 INFO mapred.JobClient:  map 95% reduce 0%
> > 07/09/06 23:10:50 INFO mapred.JobClient:  map 100% reduce 0%
> >
> > Then the program hang for a long time until I kill it.
> > Here's what I find in the 'tasktracker' log file:
> >
> > ......
> > 2007-09-06 22:54:52,569 INFO
> > org.apache.hadoop.mapred.TaskTracker: LaunchTaskAct
> > ion: task_0001_m_000021_0
> > 2007-09-06 22:54:53,942 INFO
> > org.apache.hadoop.mapred.TaskTracker: task_0001_m_0 00019_0
> > 1.0% hdfs://(masked machine name):9000/user/(masked user
> > name)/input/hadoop-defau
> > lt.xml:26068+1018
> > 2007-09-06 22:54:53,944 INFO
> > org.apache.hadoop.mapred.TaskTracker: Task task_000
> > 1_m_000019_0 is done.
> > 2007-09-06 22:54:54,040 INFO
> > org.apache.hadoop.mapred.TaskTracker: task_0001_m_0 00021_0
> > 1.0% hdfs://(masked machine name):9000/user/(masked user
> > name)/input/hadoop-site.
> > xml:0+178
> > 2007-09-06 22:54:54,043 INFO
> > org.apache.hadoop.mapred.TaskTracker: Task task_000
> > 1_m_000021_0 is done.
> > 2007-09-06 22:54:54,059 INFO
> > org.apache.hadoop.mapred.TaskTracker: LaunchTaskAct
> > ion: task_0001_r_000000_0
> > 2007-09-06 22:54:55,935 INFO
> > org.apache.hadoop.mapred.TaskTracker: task_0001_r_0 00000_0
> > 0.0% reduce > copy >
> > 2007-09-06 22:54:56,939 INFO
> > org.apache.hadoop.mapred.TaskTracker: task_0001_r_0 00000_0
> > 0.0% reduce > copy >
> > 2007-09-06 22:54:57,942 INFO
> > org.apache.hadoop.mapred.TaskTracker: task_0001_r_0 00000_0
> > 0.0% reduce > copy >
> > 2007-09-06 22:54:58,947 INFO
> > org.apache.hadoop.mapred.TaskTracker: task_0001_r_0 00000_0
> > 0.0% reduce > copy > ......
> >
> > The last line repeats until the end of log file.
> >
> > Any one have an idea what the problem is? Any suggestion is
> > appreciated!
> >
>
>

Reply via email to