On Fri, Aug 03, 2007 at 12:17:37PM -0700, Joydeep Sen Sarma wrote: >I have a fairly simple job with a map, a local combiner and a reduce. >The combiner and the reduce do the equivalent of a group_concat (mysql). > > >I have horrible performance in the reduce stage: >- the map jobs are done >- all the reduce jobs claim they are copying data - but the copy rate is >abysmal (0.5MBps) > - checked the network topology - everything's on GigE and on same >switch. (80 machine cluster) > - seeing 50+ MBps bandwidth between any pair using scp >- when I look at the machines where reduce is running - vmstat says 0% >cpu util. > >A sample reducetask log is below. Job conf: 64 way reduce. I specified >the map tasks to the same number - but hadoop is anyway creating 386 map >tasks. >
The no. of maps is only a hint to the JobTracker, to truly control the no. of maps you need to write your own input-split: http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/InputSplit.html >Anyone has some quick hints on what could be going wrong? > Couple of things: I have no silver bullet, but the *slow hosts* is one clue: there were a couple of failures when trying to fetch map-outputs; do you see any exceptions in your reduce task's syslog? (in logs/userlogs/${reduce_taskid}/syslog/part-*) Pertinent piece of information: there are some bugs (upto and including 0.14.0 release) w.r.t fetch-failures leading to hung reduces. Please look at http://issues.apache.org/jira/browse/HADOOP-1158 for more details... Hope that helps, apologies for the late response. Arun >Thanks, > >Joydeep > >2007-08-03 12:06:54,408 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 2 known map output location(s); scheduling... >2007-08-03 12:06:54,408 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 >dup hosts) >2007-08-03 12:06:59,409 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Need 1 map output(s) >2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map >outputs from previous failures >2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 2 known map output location(s); scheduling... >2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 >dup hosts) >2007-08-03 12:07:04,411 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Need 1 map output(s) >2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map >outputs from previous failures >2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 2 known map output location(s); scheduling... >2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 >dup hosts) >2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Need 1 map output(s) >2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map >outputs from previous failures >2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 2 known map output location(s); scheduling... >2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 >dup hosts) >2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Need 1 map output(s) >2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map >outputs from previous failures >2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 2 known map output location(s); scheduling... >2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 >dup hosts) >2007-08-03 12:07:19,417 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Need 1 map output(s) >2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map >outputs from previous failures >2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Got 2 known map output location(s); scheduling... >2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 >dup hosts) >2007-08-03 12:07:24,419 INFO org.apache.hadoop.mapred.ReduceTask: >task_0169_r_000010_0 Need 1 map output(s) > > >
