Is there a way for all the reducers to have access to the total number of
records that were processed in the Map phase?

For example, I'm trying to perform a simple document frequency calculation.
During the map phase, I emit <word, 1> pairs for every unique word in every
document.  During the reduce phase, I sum the values for each word group.
Then I want to divide that value by the total number of documents.

I suppose I can create a whole separate m/r job whose sole purpose is to
count all the records, then pass that number on.  Is there a more
straighforward way of doing this?

Andy

Reply via email to