Re: slowness in hadoop reduce phase when using distributed mode

moonwatcher Thu, 03 May 2007 13:02:23 -0700

  i am using hadoop-0.12.3.
  i am using a single node cluster -- running hdfs daemons, job tracker, task 
tracker.
  the dataset is about 12 MB of log files.
  
  other information:
  on the map phase, cpu went really high, close to 100%.
  on the reduce phase, cpu was near zero, usually 1% or 2%. but the  reduce 
phase did complete eventually, and produced the correct output. this is 
consistent behaviour.
  
  thanks,
  mw


Doug Cutting <[EMAIL PROTECTED]> wrote:  What version of Hadoop are you using?  
On what sort of a cluster?  How 
big is your dataset?

Doug

moonwatcher wrote:
> hey guys,
>   
>  i've setup hadoop in distributed mode (jobtracker, tasktracker, and  hdfs 
> daemons), and observing that the map phase executes really quickly  but the 
> reduce phase is really slow. the application is simply to read  some log 
> files, whose lines constitute of key-value pairs, and  summarize based on the 
> keys, summing the values... so this seems like  an ideal application of 
> hadoop.
>   
> could you suggest  where the bottleneck might be? by logging, i observed that 
> it is not in  my reducer implementation. could it be in the RPC? or the sort 
> or  copying phases?
>   would there be any certain properties that should be tweaked?
>   
>   thanks and best regards,
>   mw
>   
>   
>        
> ---------------------------------
> Ahhh...imagining that irresistible "new car" smell?
>  Check outnew cars at Yahoo! Autos.


       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.

Re: slowness in hadoop reduce phase when using distributed mode

Reply via email to