One more idea for GSoC project. 

YSmart uses correlation between multiple MR jobs to reduce the number of MR 
jobs generated. I remember Dmitriy bringing this up early. The techniques 
specified in this paper (Input, Job Flow, Transit correlations) has been 
patched into Hive. If Pig doesn't use these optimizations then I think it will 
be good to have them in Pig as well. 

Here is the link to the paper 
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

I think this can be a good candidate project for GSoC. 

Thanks
-- Prasanth

On Mar 21, 2013, at 3:51 PM, Olga Natkovich <[email protected]> wrote:

> +1 on that
> 
> 
> ________________________________
> From: Russell Jurney <[email protected]>
> To: "[email protected]" <[email protected]> 
> Sent: Thursday, March 21, 2013 11:54 AM
> Subject: Re: Put a "Google summer of code 2013" cwiki page
> 
> Make Grunt use Antlr - high priority one for me. Once Grunt uses Antlr,
> macros will flourish.
> 
> 
> On Wed, Mar 20, 2013 at 6:25 PM, Daniel Dai <[email protected]> wrote:
> 
>> https://cwiki.apache.org/confluence/display/PIG/GSoc2013
>> 
>> Feel free to add more project which could fit in the timeline of a
>> student summer project.
>> 
>> I remember there are several projects we discussed in our last meetup:
>> * Allow Pig use Hive UDFs, Alan, do we have a ticket for that?
>> * A general framework for Pig performance test, Rohini, do we have a
>> ticket?
>> 
>> Thanks,
>> Daniel
>> 
> 
> 
> 
> -- 
> Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com

Reply via email to