[ https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254253#comment-13254253 ]
Dmitriy V. Ryaboy commented on PIG-2652: ---------------------------------------- Patching things up in JobControlCompiler, or getting JCC to run before MRCompiler appears to require a significant rewrite. Alternate proposal: If parallelism is not set explicitly, and no default is specified, set the number of quantiles to pig.exec.reducers.max. WeightedPartitioner will then need to look at its actual parallelism and evenly distribute the (up to max-reducers) quantiles among partitions. We'd need to do something like that anyway if we used LoadFunc-reported histograms, or existing samples, to do the weighted partitioning, instead of running a sampling job every time. Thoughts? > Skew join and order by don't trigger reducer estimation > ------------------------------------------------------- > > Key: PIG-2652 > URL: https://issues.apache.org/jira/browse/PIG-2652 > Project: Pig > Issue Type: Bug > Reporter: Bill Graham > Assignee: Bill Graham > Fix For: 0.10.0, 0.9.3, 0.11 > > Attachments: PIG-2652_1.patch > > > If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the > number of reducers is not estimated based on input size for skew joins or > order by. Instead, these jobs get only 1 reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira