This is a little different than how we've done such things before, but how about a project to get Pig to run on Spark (aka, Spork)? The Twitter pig folks have some code we'd love to share that got us half-way there, it was looking pretty promising (if anyone is curious, it's the "spork" branch on my github fork of pig: https://github.com/dvryaboy/pig )
D On Thu, Mar 21, 2013 at 2:05 PM, Prasanth J <[email protected]>wrote: > One more idea for GSoC project. > > YSmart uses correlation between multiple MR jobs to reduce the number of > MR jobs generated. I remember Dmitriy bringing this up early. The > techniques specified in this paper (Input, Job Flow, Transit correlations) > has been patched into Hive. If Pig doesn't use these optimizations then I > think it will be good to have them in Pig as well. > > Here is the link to the paper > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf > > I think this can be a good candidate project for GSoC. > > Thanks > -- Prasanth > > On Mar 21, 2013, at 3:51 PM, Olga Natkovich <[email protected]> wrote: > > > +1 on that > > > > > > ________________________________ > > From: Russell Jurney <[email protected]> > > To: "[email protected]" <[email protected]> > > Sent: Thursday, March 21, 2013 11:54 AM > > Subject: Re: Put a "Google summer of code 2013" cwiki page > > > > Make Grunt use Antlr - high priority one for me. Once Grunt uses Antlr, > > macros will flourish. > > > > > > On Wed, Mar 20, 2013 at 6:25 PM, Daniel Dai <[email protected]> > wrote: > > > >> https://cwiki.apache.org/confluence/display/PIG/GSoc2013 > >> > >> Feel free to add more project which could fit in the timeline of a > >> student summer project. > >> > >> I remember there are several projects we discussed in our last meetup: > >> * Allow Pig use Hive UDFs, Alan, do we have a ticket for that? > >> * A general framework for Pig performance test, Rohini, do we have a > >> ticket? > >> > >> Thanks, > >> Daniel > >> > > > > > > > > -- > > Russell Jurney twitter.com/rjurney [email protected] > datasyndrome.com > >
