[ 
https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254431#comment-13254431
 ] 

Dmitriy V. Ryaboy commented on PIG-2652:
----------------------------------------

Looks like the third rule isn't correct either.

The problems seems to be that the SampleOptimizer used to do one thing, but now 
does (at least) two things. It used to remove an unnecessary MR job, as 
described in the class javadoc. As of PIG-1642, though, it's also responsible 
for reducer estimation. However, that optimization is not always possible -- 
which means reducer estimation also doesn't happen.

I think we should separate the two functionalities, either by reworking the 
SampleOptimizer code, or changing how WeightedPartitioner works. The former is 
less intrusive, the latter is probably a more architecturally sound solution.

Opinions?
                
> Skew join and order by don't trigger reducer estimation
> -------------------------------------------------------
>
>                 Key: PIG-2652
>                 URL: https://issues.apache.org/jira/browse/PIG-2652
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>             Fix For: 0.10.0, 0.9.3, 0.11
>
>         Attachments: PIG-2652_1.patch
>
>
> If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the 
> number of reducers is not estimated based on input size for skew joins or 
> order by. Instead, these jobs get only 1 reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to