pig-user  

RE: Only One reducer can get the total log num

Olga Natkovich
Tue, 30 Sep 2008 09:07:00 -0700

You can see whether combiner is invoked or not bu running 

explain TOTAL;

In any case, to produce a single result you need one combiner. We are
reworking the way combiner is invoked in types branch. You can try that.


Olga

> -----Original Message-----
> From: Mridul Muralidharan [EMAIL PROTECTED] 
> Sent: Monday, September 29, 2008 7:07 PM
> To: pig-user@incubator.apache.org
> Subject: Re: Only One reducer can get the total log num
> 
> 
> Does combiner get invoked in this specific case ?
> I thought it did not fit the pattern mentioned in [1] for 
> invoking combiners ... assuming I am not wrong, if there are 
> newer patterns where combiner is invoked, would be great if 
> it gets documented some place (preferably in the bug or some 
> wiki page)
> 
> 
> Thanks,
> Mridul
> 
> [1] http://issues.apache.org/jira/browse/PIG-7
> 
> Olga Natkovich wrote:
> > This is fine. Combiner is used to preaggregate the data on the map 
> > side and that is done in parallel. The final result has to 
> be computed 
> > by a single reducer since you do want to get a single value 
> in your outout.
> > 
> > Olga
> > 
> >> -----Original Message-----
> >> From: paradisehit [EMAIL PROTECTED]
> >> Sent: Thursday, September 25, 2008 9:28 PM
> >> To: pig-user
> >> Subject: Only One reducer can get the total log num
> >>
> >>  
> >>  
> >>  I use the script like this:
> >> querys = GROUP clear_log  ALL PARALLEL 4; TOTAL = FOREACH querys 
> >> GENERATE FLATTEN(clear_log.($1, $2)), COUNT($1);
> >>
> >> STORE TOTAL INTO 'total';
> >>
> >> AND I see the monitor page in the hadoop jobtracker, and I 
> see that 
> >> only one reduce process the data, and other 3 reducers 
> just process 
> >> 0M data?
> >>
> >> I think this should be changed, but how can I change it? 
> >>
> >> Help me!!
> >>
> 
>