Olga Natkovich
Tue, 30 Sep 2008 09:07:00 -0700
You can see whether combiner is invoked or not bu running explain TOTAL; In any case, to produce a single result you need one combiner. We are reworking the way combiner is invoked in types branch. You can try that. Olga > -----Original Message----- > From: Mridul Muralidharan [EMAIL PROTECTED] > Sent: Monday, September 29, 2008 7:07 PM > To: pig-user@incubator.apache.org > Subject: Re: Only One reducer can get the total log num > > > Does combiner get invoked in this specific case ? > I thought it did not fit the pattern mentioned in [1] for > invoking combiners ... assuming I am not wrong, if there are > newer patterns where combiner is invoked, would be great if > it gets documented some place (preferably in the bug or some > wiki page) > > > Thanks, > Mridul > > [1] http://issues.apache.org/jira/browse/PIG-7 > > Olga Natkovich wrote: > > This is fine. Combiner is used to preaggregate the data on the map > > side and that is done in parallel. The final result has to > be computed > > by a single reducer since you do want to get a single value > in your outout. > > > > Olga > > > >> -----Original Message----- > >> From: paradisehit [EMAIL PROTECTED] > >> Sent: Thursday, September 25, 2008 9:28 PM > >> To: pig-user > >> Subject: Only One reducer can get the total log num > >> > >> > >> > >> I use the script like this: > >> querys = GROUP clear_log ALL PARALLEL 4; TOTAL = FOREACH querys > >> GENERATE FLATTEN(clear_log.($1, $2)), COUNT($1); > >> > >> STORE TOTAL INTO 'total'; > >> > >> AND I see the monitor page in the hadoop jobtracker, and I > see that > >> only one reduce process the data, and other 3 reducers > just process > >> 0M data? > >> > >> I think this should be changed, but how can I change it? > >> > >> Help me!! > >> > >