On Aug 22, 2007, at 10:55 AM, Ted Dunning wrote:

I am finding that it is a common pattern that multi-phase map-reduce
programs I need to write very often have nearly degenerate map functions in second and later map-reduce phases. The only need for these function is to select the next reduce key and very often, a local combiner can be used to
greatly decrease the number of records passed to the second reduce.

My opinion is that handling these kinds of patterns in the framework itself is a mistake. It would introduce a lot of complexity and the payback would be relatively slight in terms of the application. I'd much rather have the Hadoop framework support the single primitive (map/reduce) very well and build a layer on top that provides a very general algebra over map/reduce operations. One early example of this is Pig (http://research.yahoo.com/project/pig).

-- Owen

Reply via email to