[
https://issues.apache.org/jira/browse/PIG-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939962#comment-13939962
]
Aniket Mokashi commented on PIG-2784:
-------------------------------------
[~rajitha], [~zhiweicai], thanks for your interest in this project.
To proceed, you need to submit your proposal with the details on approach, plan
etc. If you would like to clarify something, please use this jira as a place
for discussions.
bq. Do we know about the size of data to process before Pig compile the job?
Yes. This lets pig do reducer estimation.
bq. What's the difference between implementing this framework inside
JobControlCompiler and inside MRCompiler? Which one do you think is better?
- MRCompiler deals with compiling physical plan into mapreduce operators and
JobControlCompiler takes these compiled jobs and submits them to run on hadoop
via hadoop's jobcontrol api. It's also responsible for maintaining progress
report, stats etc. As part of this jira, you need to find out how we can take
any (or all) of these optimizations and find the best place to plug them in. I
will look forward to see your thoughts on how it should work.
bq. Do I need to consider more kind of optimization other than optimizations
mentioned in the description? Is it possible that we categorize the
optimizations into several types and make it easier to extend in the future?
It would be nice if we can allow additions of new optimizations in future.
> Framework for dynamic query optimization
> ----------------------------------------
>
> Key: PIG-2784
> URL: https://issues.apache.org/jira/browse/PIG-2784
> Project: Pig
> Issue Type: New Feature
> Reporter: Jie Li
> Labels: GSOC2014
>
> We need a framework to implement dynamic query optimization, i.e. changing
> the query plan at runtime. Currently we support estimating the number of
> reducers dynamically, which works well as the first step but was not
> perfectly implemented. In near future, we'll support more dynamic
> optimization, like [removing sample job for
> order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit
> job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting
> skew and using skew-join, etc.
> Currently estimating #reducer is implemented in JobControlCompiler after
> MRCompiler compiles all the MapReduceOperators and generate the complete
> MRPlan. One place (discussed with Thejas) to implement the framework is at
> the MRCompiler, where the MRPlan'll be generated at batches and adjusted
> dynamically.
> Any comment?
> This is a candidate project for Google summer of code 2014. More information
> about the program can be found at
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014
--
This message was sent by Atlassian JIRA
(v6.2#6252)