On 06/12/2010 05:33 AM, Torsten Curdt wrote:
I have one data source. In the first mapper I would like to do fan
out. But I would like to emit different data types:

  Mapper:
    if (a) emit(Text, Integer)
    if (b) emit(Long, Text)

and now I would like to have a Reducer for (a) and a separate Reducer for (b).
While reading from the input for each (a) and (b) is possible it too
inefficient.

Might an API like Google's FlumeJava be appropriate?

http://portal.acm.org/citation.cfm?id=1806596.1806638

I think the MapReduce project should strive to support efficient lower-level APIs, leaving higher-level APIs to other projects. For example, I think you could implement something like the above in Pig. FlumeJava manages to implement a powerful, efficient, high-level Java API on top of a presumably fairly low-level MapReduce API. The lower-level runtime can then be shared with systems like Pig & Hive.

Doug

Reply via email to