[ 
https://issues.apache.org/jira/browse/PIG-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3928:
--------------------------------

    Fix Version/s:     (was: 0.13.0)
                   0.14.0

> Reducer estimator gets wrong configuration for ORDER_BY job
> -----------------------------------------------------------
>
>                 Key: PIG-3928
>                 URL: https://issues.apache.org/jira/browse/PIG-3928
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.1, 0.13.0
>            Reporter: Aniket Mokashi
>             Fix For: 0.14.0
>
>
> SAMPLER job requires a parameter that needs to be equal to number of reducers 
> used by ORDER_BY job. This is done by getting successor of SAMPLER job and 
> estimating reducers for it in the following code. However, job (conf) passed 
> to calculateRuntimeReducers is corresponding to SAMPLER job instead of 
> ORDER_BY job which causes problems in some custom reducer estimators that 
> depend on the configuration.
> {code}
> // inside 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>     public void adjustNumReducers(MROperPlan plan, MapReduceOper mro,
>             org.apache.hadoop.mapreduce.Job nwJob) throws IOException {
>         int jobParallelism = calculateRuntimeReducers(mro, nwJob);
>         if (mro.isSampler() && plan.getSuccessors(mro) != null) {
>             // We need to calculate the final number of reducers of the next 
> job (order-by or skew-join)
>             // to generate the quantfile.
>             MapReduceOper nextMro = plan.getSuccessors(mro).get(0);
>             // Here we use the same conf and Job to calculate the runtime 
> #reducers of the next job
>             // which is fine as the statistics comes from the nextMro's 
> POLoads
>             int nPartitions = calculateRuntimeReducers(nextMro, nwJob);
>             // set the runtime #reducer of the next job as the #partition
>             ParallelConstantVisitor visitor =
>                     new ParallelConstantVisitor(mro.reducePlan, nPartitions);
>             visitor.visit();
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to