[ 
https://issues.apache.org/jira/browse/PIG-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931439#comment-13931439
 ] 

Zhiwei Cai commented on PIG-2784:
---------------------------------

Hi,

My name is Zhiwei Cai and I'm writing a proposal for this project in GSOC 2014. 
I have some confusion about this idea and hope some of you can clarify it for 
me. I would be grateful if some of you can guide me in.
1. Do we know about the size of data to process before Pig compile the job? 
2. What's the difference between implementing this framework inside 
JobControlCompiler  and inside MRCompiler? Which one do you think is better?
3. Do I need to consider more kind of optimization other than optimizations 
mentioned in the description? Is it possible that we categorize the 
optimizations into several types and make it easier to extend in the future?

Best,
Zhiwei

> Framework for dynamic query optimization
> ----------------------------------------
>
>                 Key: PIG-2784
>                 URL: https://issues.apache.org/jira/browse/PIG-2784
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Jie Li
>              Labels: GSOC2014
>
> We need a framework to implement dynamic query optimization, i.e. changing 
> the query plan at runtime. Currently we support estimating the number of 
> reducers dynamically, which works well as the first step but was not 
> perfectly implemented. In near future, we'll support more dynamic 
> optimization, like [removing sample job for 
> order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit 
> job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting 
> skew and using skew-join, etc.
> Currently estimating #reducer is implemented in JobControlCompiler after 
> MRCompiler compiles all the MapReduceOperators and generate the complete 
> MRPlan. One place (discussed with Thejas) to implement the framework is at 
> the MRCompiler, where the MRPlan'll be generated at batches and adjusted 
> dynamically. 
> Any comment?
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to