One more idea for GSoC project. YSmart uses correlation between multiple MR jobs to reduce the number of MR jobs generated. I remember Dmitriy bringing this up early. The techniques specified in this paper (Input, Job Flow, Transit correlations) has been patched into Hive. If Pig doesn't use these optimizations then I think it will be good to have them in Pig as well.
Here is the link to the paper http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf I think this can be a good candidate project for GSoC. Thanks -- Prasanth On Mar 21, 2013, at 3:51 PM, Olga Natkovich <[email protected]> wrote: > +1 on that > > > ________________________________ > From: Russell Jurney <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Thursday, March 21, 2013 11:54 AM > Subject: Re: Put a "Google summer of code 2013" cwiki page > > Make Grunt use Antlr - high priority one for me. Once Grunt uses Antlr, > macros will flourish. > > > On Wed, Mar 20, 2013 at 6:25 PM, Daniel Dai <[email protected]> wrote: > >> https://cwiki.apache.org/confluence/display/PIG/GSoc2013 >> >> Feel free to add more project which could fit in the timeline of a >> student summer project. >> >> I remember there are several projects we discussed in our last meetup: >> * Allow Pig use Hive UDFs, Alan, do we have a ticket for that? >> * A general framework for Pig performance test, Rohini, do we have a >> ticket? >> >> Thanks, >> Daniel >> > > > > -- > Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
