Re: Working with the output files of a hadoop application

Doug Cutting Wed, 15 Aug 2007 09:18:30 -0700

Sebastien Rainville wrote:

I am new to Hadoop. Looking at the documentation, I figured out how to
write map and reduce functions but now I'm stuck... How do we work with
the output file produced by the reducer? For example, the word count
example produces a file with words as keys and the number of occurrences
of each word as the values. Now, let's say I want to get the total
number of words by analyzing the output file... how I am supposed to do
it?


For global counts you can use counters:

http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/RunningJob.html#getCounters()

The framework includes a counter for the number of output records, whichis what you want in this case, so you don't even need to add a counterfor that.

For more complex summary statistics, if your output is very large, thenit might be appropriate to run another MapReduce job over the outputjust to compute these.


Doug

Re: Working with the output files of a hadoop application

Reply via email to