RE: Is it possible to have one map operation "talking" to multiple Reduce operations?

Mark Meissonnier Tue, 22 May 2007 11:55:16 -0700

The first approach works assuming the output keyspace is the same, or at
least compatible...
I was thinking I'd have to do something like a side file.
Thanks for the quick response.
Mark

-----Original Message-----
From: Owen O'Malley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 22, 2007 11:44 AM
To: [email protected]
Subject: Re: Is it possible to have one map operation "talking" to
multiple Reduce operations?

On May 22, 2007, at 11:31 AM, Mark Meissonnier wrote:

> Say you have a complicated function that is being called by a map 
> method, but it produces a lot of information that can be used to 
> produce two types of indices, is it possible to have 2 "map" outputs ,

> which branch off respectively to a reduce1 method and a reduce2 
> method?

No. There are a couple of ways around it.

Probably the most efficient is to make the reduces act differently based
on their partition id. So you'd say that reduces 0...999 are doing X and
reduces 1000...1999 are doing Y. The transient data would have to be a
tagged union of the types you are sending to the different reduces.

The easier approach is that you have the maps write a side file with the
input for the second reduce. After your first job finishes, you launch a
second job that processes the side files as the input.

-- Owen

RE: Is it possible to have one map operation "talking" to multiple Reduce operations?

Reply via email to