Difference between joining and reducing

Stuart Sierra Thu, 03 Jul 2008 07:54:36 -0700

Hello all,

After recent talk about joins, I have a (possibly) stupid question:


What is the difference between the "join" operations in
o.a.h.mapred.join and the standard merge step in a MapReduce job?

I understand that doing a join in the Mapper would be much more
efficient if you're lucky enough to have your input pre-sorted and
-partitioned.

But how is a join operation in the Reducer any different from the
shuffle/sort/merge that the MapReduce framework already does?

Be gentle.  Thanks,
-Stuart

Difference between joining and reducing

Reply via email to