Hi Andy,
Take a look at this piece of code:
Counters counters = job.getCounters();
counters.findCounter("org.apache.hadoop.mapred.Task$Counter",
"REDUCE_INPUT_RECORDS").getCounter()
This is for reduce input records but I believe there is also a counter for
reduce output records. You should dig into the source code to find out what
it is because unfortunately, the default counters associated with the
map/reduce jobs are not public yet.
-Jim
On Tue, Apr 14, 2009 at 11:19 AM, Andy Liu <[email protected]> wrote:
> Is there a way for all the reducers to have access to the total number of
> records that were processed in the Map phase?
>
> For example, I'm trying to perform a simple document frequency calculation.
> During the map phase, I emit <word, 1> pairs for every unique word in every
> document. During the reduce phase, I sum the values for each word group.
> Then I want to divide that value by the total number of documents.
>
> I suppose I can create a whole separate m/r job whose sole purpose is to
> count all the records, then pass that number on. Is there a more
> straighforward way of doing this?
>
> Andy
>