On Wed, Oct 19, 2011 at 2:42 PM, Zheng Shao <[email protected]> wrote: > Google's Tenzing paper mentioned that they modified MR to make sorting in > reducer optional: > http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/37200.pdf > > Is there any plan to support that in MR 2.0?
Hey Zheng, I don't know that anyone is working on it, but IMO the main advantage of MR2 here is that it will be much easier for users to experiment with new ideas on top of a shared cluster. Since the MR framework code becomes user-level submitted code, it's easy to recompile and resubmit jobs with a hacked MR without restarting the cluster or impacting other users. Would be interesting to see the Hive project experiment with this optimization - I remember discussing this on a JIRA with Joydeep a couple years ago. -Todd -- Todd Lipcon Software Engineer, Cloudera
