Sebastien Rainville wrote:
I am new to Hadoop. Looking at the documentation, I figured out how to
write map and reduce functions but now I'm stuck... How do we work with
the output file produced by the reducer? For example, the word count
example produces a file with words as keys and the number of occurrences
of each word as the values. Now, let's say I want to get the total
number of words by analyzing the output file... how I am supposed to do
it?
For global counts you can use counters:
http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/RunningJob.html#getCounters()
The framework includes a counter for the number of output records, which
is what you want in this case, so you don't even need to add a counter
for that.
For more complex summary statistics, if your output is very large, then
it might be appropriate to run another MapReduce job over the output
just to compute these.
Doug