Is there a way for all the reducers to have access to the total number of records that were processed in the Map phase?
For example, I'm trying to perform a simple document frequency calculation. During the map phase, I emit <word, 1> pairs for every unique word in every document. During the reduce phase, I sum the values for each word group. Then I want to divide that value by the total number of documents. I suppose I can create a whole separate m/r job whose sole purpose is to count all the records, then pass that number on. Is there a more straighforward way of doing this? Andy
