LIMIT is not effective when reducers are estimated higher than 1
----------------------------------------------------------------

                 Key: PIG-2524
                 URL: https://issues.apache.org/jira/browse/PIG-2524
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.9.2
            Reporter: Doug Daniels


If the user does not provide a default # of reducers on the operation or the 
script, pig estimates the number of reducers based on the input data size.  If 
that estimate yields > 1 reducer, then LIMIT operators can produce more results 
than they are supposed to.

This happens b/c the reducer estimation code 
(JobControlCompiler.estimateNumberOfReducers) runs after the LimitAdjuster code 
(MRCompiler.compile).  So, if the reducer estimation uses > 1 reducer, then the 
LimitAdjuster will not have added an extra 1-reducer MR job to enforce the 
proper limit.

This seems related to PIG-2295.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to