I have a set of (key, value) pairs. For each value there is a function f(value) that returns an integer. I want to generate a histogram over f(value) for my data set. For example, representing the values as [f(value)] if I have the data set
key1, [3] key2, [4] key3, [3] key4, [5] I'd want to produce 3, 2 4, 1 5, 1 because f(value) = 3 appears twice in my data set while f(value) = 4 and f(value) = 5 each appears once. I gather the right way to do this is to use the Aggregator framework, but I can't understand the documentation. I've read the API docs for the ValueAggregatorDescriptor<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/aggregate/ValueAggregatorDescriptor.html>and related classes and looked at the Aggreate*.java files in the examples directory, but it's still not making sense to me. (The may in part be due to the fact that the examples are still for the old API while I'm working in the new API, though I'm not sure.) Can someone point me to clearer documentation online or in print, or provide a simple example for my task? Thanks.
