[ 
https://issues.apache.org/jira/browse/PIG-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404063#comment-13404063
 ] 

Jie Li commented on PIG-2779:
-----------------------------

For the order-by, we need to pass its *final* #reducer (not the estimated one) 
to the sample job to generate the partition file, otherwise the partition file 
will be inconsistent and cause errors.

The final #reducer is calculated based on the requested one and the estimated 
one, the latter of which is calculated based on the input data size. Luckily 
the sample job has the same input data with the order-by, thus it can calculate 
in advance the final #reducer of the order-by.
                
> Refactoring the code for setting number of reducers
> ---------------------------------------------------
>
>                 Key: PIG-2779
>                 URL: https://issues.apache.org/jira/browse/PIG-2779
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jie Li
>             Fix For: 0.11
>
>
> As PIG-2652 observed, currently the code for setting number of reducers is a 
> little messy. MapReduceOper.requestedParallelism seems being misused in some 
> plases, and now we support runtime estimation of #reducer which further 
> complicates the problem.
> For example, if we specify parallel 1 for the order-by, the estimated 
> #reducer will be used. If we specify parallel 2 while it estimates 4, 
> order-by will fail due to "Illegal partition for Null". If we specify 
> parallel 4 while it estimates 2, then some reducers will have nothing to do. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to