I have two directories. Directory 1 contains values of the form <k, x> and
directory 2 contains values of the form <k, y>.  The key values are the same
in the two directories. I want to take them as input and produce output of
the form <k, f(x,y)>. A reasonable strategy is to do a reduce-side Join as
described in section 3.5.1 of *Data-Intensive Text Processing with
MapReduce<http://www.amazon.com/Data-Intensive-Processing-MapReduce-Synthesis-Technologies/dp/1608453421>
*.

This works fine if x and y are of the same type (e.g. they're both Text). It
also works if they are different types but both Writable (maybe x is Text
and y is IntWritable), because you can still create a a Writable object that
wraps both of them and use that as the value type for both input
directories.

However, what if x is Writable and y is serialized with some other scheme,
say Avro? It seems like you couldn't write a MapReduce process to
generate <k, f(x,y)>, because the process can only specify a single
serialization scheme for its value. Is there a way to write a MapReduce
process to do a reduce-side join in this case?

Reply via email to