pig-user  

RE: Only One reducer can get the total log num

Olga Natkovich
Fri, 26 Sep 2008 09:02:48 -0700

This is fine. Combiner is used to preaggregate the data on the map side
and that is done in parallel. The final result has to be computed by a
single reducer since you do want to get a single value in your outout.

Olga 

> -----Original Message-----
> From: paradisehit [EMAIL PROTECTED] 
> Sent: Thursday, September 25, 2008 9:28 PM
> To: pig-user
> Subject: Only One reducer can get the total log num
> 
>  
>  
>  I use the script like this:
> querys = GROUP clear_log  ALL PARALLEL 4; TOTAL = FOREACH 
> querys GENERATE FLATTEN(clear_log.($1, $2)), COUNT($1);
> 
> STORE TOTAL INTO 'total';
> 
> AND I see the monitor page in the hadoop jobtracker, and I 
> see that only one reduce process the data, and other 3 
> reducers just process 0M data?
> 
> I think this should be changed, but how can I change it? 
> 
> Help me!!
>