Hello all, After recent talk about joins, I have a (possibly) stupid question:
What is the difference between the "join" operations in o.a.h.mapred.join and the standard merge step in a MapReduce job? I understand that doing a join in the Mapper would be much more efficient if you're lucky enough to have your input pre-sorted and -partitioned. But how is a join operation in the Reducer any different from the shuffle/sort/merge that the MapReduce framework already does? Be gentle. Thanks, -Stuart