pig-user  

Re: Only One reducer can get the total log num

Mridul Muralidharan
Mon, 29 Sep 2008 19:09:35 -0700


Does combiner get invoked in this specific case ?
I thought it did not fit the pattern mentioned in [1] for invoking combiners ... assuming I am not wrong, if there are newer patterns where combiner is invoked, would be great if it gets documented some place (preferably in the bug or some wiki page)


Thanks,
Mridul

[1] http://issues.apache.org/jira/browse/PIG-7

Olga Natkovich wrote:
This is fine. Combiner is used to preaggregate the data on the map side
and that is done in parallel. The final result has to be computed by a
single reducer since you do want to get a single value in your outout.

Olga
-----Original Message-----
From: paradisehit [EMAIL PROTECTED] Sent: Thursday, September 25, 2008 9:28 PM
To: pig-user
Subject: Only One reducer can get the total log num

I use the script like this: querys = GROUP clear_log ALL PARALLEL 4; TOTAL = FOREACH querys GENERATE FLATTEN(clear_log.($1, $2)), COUNT($1);

STORE TOTAL INTO 'total';

AND I see the monitor page in the hadoop jobtracker, and I see that only one reduce process the data, and other 3 reducers just process 0M data?

I think this should be changed, but how can I change it?
Help me!!