On May 22, 2007, at 11:31 AM, Mark Meissonnier wrote:
Say you have a complicated function that is being called by a map
method, but it produces a lot of information that can be used to
produce
two types of indices,
is it possible to have 2 "map" outputs , which branch off respectively
to a reduce1 method and a reduce2 method?
No. There are a couple of ways around it.
Probably the most efficient is to make the reduces act differently
based on their partition id. So you'd say that reduces 0...999 are
doing X and reduces 1000...1999 are doing Y. The transient data would
have to be a tagged union of the types you are sending to the
different reduces.
The easier approach is that you have the maps write a side file with
the input for the second reduce. After your first job finishes, you
launch a second job that processes the side files as the input.
-- Owen