[ 
https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254253#comment-13254253
 ] 

Dmitriy V. Ryaboy commented on PIG-2652:
----------------------------------------

Patching things up in JobControlCompiler, or getting JCC to run before 
MRCompiler appears to require a significant rewrite.

Alternate proposal: 
If parallelism is not set explicitly, and no default is specified, set the 
number of quantiles to pig.exec.reducers.max. WeightedPartitioner will then 
need to look at its actual parallelism and evenly distribute the (up to 
max-reducers) quantiles among partitions.  We'd need to do something like that 
anyway if we used LoadFunc-reported histograms, or existing samples, to do the 
weighted partitioning, instead of running a sampling job every time.

Thoughts?
                
> Skew join and order by don't trigger reducer estimation
> -------------------------------------------------------
>
>                 Key: PIG-2652
>                 URL: https://issues.apache.org/jira/browse/PIG-2652
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>             Fix For: 0.10.0, 0.9.3, 0.11
>
>         Attachments: PIG-2652_1.patch
>
>
> If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the 
> number of reducers is not estimated based on input size for skew joins or 
> order by. Instead, these jobs get only 1 reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to