Thanks! This makes a lot of sense - I do see a lot of 'slow' hosts in the logs.
Is it normal for hosts to be classified as 'slow'? (Given that there's ample bandwidth and no errors - I don't know why a peer be classified as such in the first place). -----Original Message----- From: Stu Hood [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 07, 2007 2:58 PM To: [email protected] Subject: RE: extremely slow reduce jobs Hi, I have noticed the same thing. My cluster is only 2 machines, but I've noticed "Scheduled 0 of 1 known outputs (1 slow hosts and 0 dup hosts)" over and over in userlog/task*_r_*/syslog on the namenode machine. I just found this thread which explains the problem: [http://mail-archives.apache.org/mod_mbox/lucene-hadoop-dev/200702.mbox/ [EMAIL PROTECTED] http://mail-archives.apache.org/mod_mbox/lucene-hadoop-dev/200702.mbox/% [EMAIL PROTECTED] Thanks, Stu -----Original Message----- From: Joydeep Sen Sarma <[EMAIL PROTECTED]> Sent: Fri, August 3, 2007 3:17 pm To: [email protected] Subject: extremely slow reduce jobs I have a fairly simple job with a map, a local combiner and a reduce. The combiner and the reduce do the equivalent of a group_concat (mysql). I have horrible performance in the reduce stage: - the map jobs are done - all the reduce jobs claim they are copying data - but the copy rate is abysmal (0.5MBps) - checked the network topology - everything's on GigE and on same switch. (80 machine cluster) - seeing 50+ MBps bandwidth between any pair using scp - when I look at the machines where reduce is running - vmstat says 0% cpu util. A sample reducetask log is below. Job conf: 64 way reduce. I specified the map tasks to the same number - but hadoop is anyway creating 386 map tasks. Anyone has some quick hints on what could be going wrong? Thanks, Joydeep 2007-08-03 12:06:54,408 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 2 known map output location(s); scheduling... 2007-08-03 12:06:54,408 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts) 2007-08-03 12:06:59,409 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Need 1 map output(s) 2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map outputs from previous failures 2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 2 known map output location(s); scheduling... 2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts) 2007-08-03 12:07:04,411 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Need 1 map output(s) 2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map outputs from previous failures 2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 2 known map output location(s); scheduling... 2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts) 2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Need 1 map output(s) 2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map outputs from previous failures 2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 2 known map output location(s); scheduling... 2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts) 2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Need 1 map output(s) 2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map outputs from previous failures 2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 2 known map output location(s); scheduling... 2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts) 2007-08-03 12:07:19,417 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Need 1 map output(s) 2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map outputs from previous failures 2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Got 2 known map output location(s); scheduling... 2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts) 2007-08-03 12:07:24,419 INFO org.apache.hadoop.mapred.ReduceTask: task_0169_r_000010_0 Need 1 map output(s)
