LIMIT is not effective when reducers are estimated higher than 1
----------------------------------------------------------------
Key: PIG-2524
URL: https://issues.apache.org/jira/browse/PIG-2524
Project: Pig
Issue Type: Bug
Affects Versions: 0.9.2
Reporter: Doug Daniels
If the user does not provide a default # of reducers on the operation or the
script, pig estimates the number of reducers based on the input data size. If
that estimate yields > 1 reducer, then LIMIT operators can produce more results
than they are supposed to.
This happens b/c the reducer estimation code
(JobControlCompiler.estimateNumberOfReducers) runs after the LimitAdjuster code
(MRCompiler.compile). So, if the reducer estimation uses > 1 reducer, then the
LimitAdjuster will not have added an extra 1-reducer MR job to enforce the
proper limit.
This seems related to PIG-2295.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira