On 06/12/2010 05:33 AM, Torsten Curdt wrote:
I have one data source. In the first mapper I would like to do fan
out. But I would like to emit different data types:
Mapper:
if (a) emit(Text, Integer)
if (b) emit(Long, Text)
and now I would like to have a Reducer for (a) and a separate Reducer for (b).
While reading from the input for each (a) and (b) is possible it too
inefficient.
Might an API like Google's FlumeJava be appropriate?
http://portal.acm.org/citation.cfm?id=1806596.1806638
I think the MapReduce project should strive to support efficient
lower-level APIs, leaving higher-level APIs to other projects. For
example, I think you could implement something like the above in Pig.
FlumeJava manages to implement a powerful, efficient, high-level Java
API on top of a presumably fairly low-level MapReduce API. The
lower-level runtime can then be shared with systems like Pig & Hive.
Doug