Re: slowness in hadoop reduce phase when using distributed mode

Doug Cutting Thu, 03 May 2007 11:31:37 -0700

What version of Hadoop are you using? On what sort of a cluster? Howbig is your dataset?


Doug


moonwatcher wrote:

hey guys,
i've setup hadoop in distributed mode (jobtracker, tasktracker, and hdfs daemons), and observing that the map phase executes really quickly but the reduce phase is really slow. the application is simply to read some log files, whose lines constitute of key-value pairs, and summarize based on the keys, summing the values... so this seems like an ideal application of hadoop.could you suggest where the bottleneck might be? by logging, i observed that it is not in my reducer implementation. could it be in the RPC? or the sort or copying phases?
  would there be any certain properties that should be tweaked?
thanks and best regards,
  mw
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.

Re: slowness in hadoop reduce phase when using distributed mode

Reply via email to