Re: Total number of records processed in mapper

Jim Twensky Tue, 14 Apr 2009 12:25:27 -0700

Hi Andy,

Take a look at this piece of code:


Counters counters = job.getCounters();
counters.findCounter("org.apache.hadoop.mapred.Task$Counter",
"REDUCE_INPUT_RECORDS").getCounter()

This is for reduce input records but I believe there is also a counter for
reduce output records. You should dig into the source code to find out what
it is because unfortunately, the default counters associated with the
map/reduce jobs are not public yet.

-Jim


On Tue, Apr 14, 2009 at 11:19 AM, Andy Liu <[email protected]> wrote:

> Is there a way for all the reducers to have access to the total number of
> records that were processed in the Map phase?
>
> For example, I'm trying to perform a simple document frequency calculation.
> During the map phase, I emit <word, 1> pairs for every unique word in every
> document.  During the reduce phase, I sum the values for each word group.
> Then I want to divide that value by the total number of documents.
>
> I suppose I can create a whole separate m/r job whose sole purpose is to
> count all the records, then pass that number on.  Is there a more
> straighforward way of doing this?
>
> Andy
>

Re: Total number of records processed in mapper

Reply via email to