I have two directories. Directory 1 contains values of the form <k, x> and directory 2 contains values of the form <k, y>. The key values are the same in the two directories. I want to take them as input and produce output of the form <k, f(x,y)>. A reasonable strategy is to do a reduce-side Join as described in section 3.5.1 of *Data-Intensive Text Processing with MapReduce<http://www.amazon.com/Data-Intensive-Processing-MapReduce-Synthesis-Technologies/dp/1608453421> *.
This works fine if x and y are of the same type (e.g. they're both Text). It also works if they are different types but both Writable (maybe x is Text and y is IntWritable), because you can still create a a Writable object that wraps both of them and use that as the value type for both input directories. However, what if x is Writable and y is serialized with some other scheme, say Avro? It seems like you couldn't write a MapReduce process to generate <k, f(x,y)>, because the process can only specify a single serialization scheme for its value. Is there a way to write a MapReduce process to do a reduce-side join in this case?
