Olga Natkovich
Fri, 26 Sep 2008 09:02:48 -0700
This is fine. Combiner is used to preaggregate the data on the map side and that is done in parallel. The final result has to be computed by a single reducer since you do want to get a single value in your outout. Olga > -----Original Message----- > From: paradisehit [EMAIL PROTECTED] > Sent: Thursday, September 25, 2008 9:28 PM > To: pig-user > Subject: Only One reducer can get the total log num > > > > I use the script like this: > querys = GROUP clear_log ALL PARALLEL 4; TOTAL = FOREACH > querys GENERATE FLATTEN(clear_log.($1, $2)), COUNT($1); > > STORE TOTAL INTO 'total'; > > AND I see the monitor page in the hadoop jobtracker, and I > see that only one reduce process the data, and other 3 > reducers just process 0M data? > > I think this should be changed, but how can I change it? > > Help me!! >