slowness in hadoop reduce phase when using distributed mode

moonwatcher Thu, 03 May 2007 11:21:32 -0700

hey guys,
  
  i've setup hadoop in distributed mode (jobtracker, tasktracker, and  hdfs 
daemons), and observing that the map phase executes really quickly  but the 
reduce phase is really slow. the application is simply to read  some log files, 
whose lines constitute of key-value pairs, and  summarize based on the keys, 
summing the values... so this seems like  an ideal application of hadoop.
  
  could you suggest where the bottleneck might be? by logging, i observed  that 
it is not in my reducer implementation. could it be in the RPC? or  the sort or 
copying phases?
  would there be any certain properties that should be tweaked?
  
  thanks and best regards,
  mw
  
  
       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.

slowness in hadoop reduce phase when using distributed mode

Reply via email to