Re: Poly-reduce?

Peter W. Wed, 22 Aug 2007 18:48:53 -0700

Hello,

Great observation, here's a hack that may be helpful
until such HDFS functionality is included.


You can put static Java collections inside of your class which
implements Reducer but outside of your reduce method to fix.

TreeMaps are good for this or (if(!HashMap.containsKey())).

Iterate thru your intermediate values in the while loop
placing items in these inner dictionary structures. After
the while loop (still inside reduce) get collections data,
and send to OutputCollector with session id as
WritableComparable key and Text value.

Regards,

Peter W.

On Aug 22, 2007, at 10:55 AM, Ted Dunning wrote:

I am finding that it is a common pattern that multi-phase map-reduce
programs I need to write very often have nearly degenerate mapfunctions insecond and later map-reduce phases. The only need for thesefunction is toselect the next reduce key and very often, a local combiner can beused to
greatly decrease the number of records passed to the second reduce.

It isn't hard to implement these programs as multiple fully fledged
map-reduces, but it appears to me that many of them would be better
expressed as something more like a map-reduce-reduce program.
For example, take the problem of coocurrence counting in logrecords. Thefirst map would extract a user id and an object id and group onuser id.The second reduce would take entire sessions for a single user andgenerate
co-occurrence pairs as keys for the second reduce, each with a count
determined by the frequency of the objects in the user history.The secondreduce (and local combiner) would aggregate these counts anddiscard items
with small counts.
Expressed conventionally, this would have write all of the usersessions toHDFS and a second map phase would generate the pairs for counting.Theopportunity for efficiency would come from the ability to avoidwriting
intermediate results to the distributed data store.
Has anybody looked at whether this would help and whether it wouldbe hard
to do?

Re: Poly-reduce?

Reply via email to