[ https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254431#comment-13254431 ]
Dmitriy V. Ryaboy commented on PIG-2652: ---------------------------------------- Looks like the third rule isn't correct either. The problems seems to be that the SampleOptimizer used to do one thing, but now does (at least) two things. It used to remove an unnecessary MR job, as described in the class javadoc. As of PIG-1642, though, it's also responsible for reducer estimation. However, that optimization is not always possible -- which means reducer estimation also doesn't happen. I think we should separate the two functionalities, either by reworking the SampleOptimizer code, or changing how WeightedPartitioner works. The former is less intrusive, the latter is probably a more architecturally sound solution. Opinions? > Skew join and order by don't trigger reducer estimation > ------------------------------------------------------- > > Key: PIG-2652 > URL: https://issues.apache.org/jira/browse/PIG-2652 > Project: Pig > Issue Type: Bug > Reporter: Bill Graham > Assignee: Bill Graham > Fix For: 0.10.0, 0.9.3, 0.11 > > Attachments: PIG-2652_1.patch > > > If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the > number of reducers is not estimated based on input size for skew joins or > order by. Instead, these jobs get only 1 reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira