For historical interest, the JIRAs where this was discussed were MAPREDUCE-326 and MAPREDUCE-1639
On Wed, Oct 19, 2011 at 2:44 PM, Todd Lipcon <[email protected]> wrote: > On Wed, Oct 19, 2011 at 2:42 PM, Zheng Shao <[email protected]> wrote: >> Google's Tenzing paper mentioned that they modified MR to make sorting in >> reducer optional: >> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/37200.pdf >> >> Is there any plan to support that in MR 2.0? > > Hey Zheng, > > I don't know that anyone is working on it, but IMO the main advantage > of MR2 here is that it will be much easier for users to experiment > with new ideas on top of a shared cluster. Since the MR framework code > becomes user-level submitted code, it's easy to recompile and resubmit > jobs with a hacked MR without restarting the cluster or impacting > other users. > > Would be interesting to see the Hive project experiment with this > optimization - I remember discussing this on a JIRA with Joydeep a > couple years ago. > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera
