I am working with moonwatcher.

I think we found the problem: the dataset we were using has over 3000 files, 
for each file, hadoop has to start a task (new JVM). After we combine all the 
files into one, the job was finished very very quickly. 

I think the task threads should be pooled and reused.

ds

moonwatcher <[EMAIL PROTECTED]> wrote: 
  i am using hadoop-0.12.3.
  i am using a single node cluster -- running hdfs daemons, job tracker, task 
tracker.
  the dataset is about 12 MB of log files.
  
  other information:
  on the map phase, cpu went really high, close to 100%.
  on the reduce phase, cpu was near zero, usually 1% or 2%. but the  reduce 
phase did complete eventually, and produced the correct output. this is 
consistent behaviour.
  
  thanks,
  mw
  
  
  

Doug Cutting  wrote:  What version of Hadoop are you using?  On what sort of a 
cluster?  How 
big is your dataset?

Doug

moonwatcher wrote:
> hey guys,
>   
>  i've setup hadoop in distributed mode (jobtracker, tasktracker, and  hdfs 
> daemons), and observing that the map phase executes really quickly  but the 
> reduce phase is really slow. the application is simply to read  some log 
> files, whose lines constitute of key-value pairs, and  summarize based on the 
> keys, summing the values... so this seems like  an ideal application of 
> hadoop.
>   
> could you suggest  where the bottleneck might be? by logging, i observed that 
> it is not in  my reducer implementation. could it be in the RPC? or the sort 
> or  copying phases?
>   would there be any certain properties that should be tweaked?
>   
>   thanks and best regards,
>   mw
>   
>   
>        
> ---------------------------------
> Ahhh...imagining that irresistible "new car" smell?
>  Check outnew cars at Yahoo! Autos.


       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.

       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.

Reply via email to