Mridul Muralidharan
Mon, 29 Sep 2008 19:09:35 -0700
Does combiner get invoked in this specific case ?I thought it did not fit the pattern mentioned in [1] for invoking combiners ... assuming I am not wrong, if there are newer patterns where combiner is invoked, would be great if it gets documented some place (preferably in the bug or some wiki page)
Thanks, Mridul [1] http://issues.apache.org/jira/browse/PIG-7 Olga Natkovich wrote:
This is fine. Combiner is used to preaggregate the data on the map side and that is done in parallel. The final result has to be computed by a single reducer since you do want to get a single value in your outout.Olga-----Original Message-----From: paradisehit [EMAIL PROTECTED] Sent: Thursday, September 25, 2008 9:28 PMTo: pig-user Subject: Only One reducer can get the total log numI use the script like this: querys = GROUP clear_log ALL PARALLEL 4; TOTAL = FOREACH querys GENERATE FLATTEN(clear_log.($1, $2)), COUNT($1);STORE TOTAL INTO 'total';AND I see the monitor page in the hadoop jobtracker, and I see that only one reduce process the data, and other 3 reducers just process 0M data?I think this should be changed, but how can I change it?Help me!!