[
https://issues.apache.org/jira/browse/PIG-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124489#comment-13124489
]
Thejas M Nair commented on PIG-2316:
------------------------------------
The LogicalExpressionSimplifier rules are complex and it has a large number of
them. This is the fourth bug originating from this optimization rule that
causes correctness issues, and that is related to the complexity of this rule.
It is hard to understand and maintain this code. I don't expect these rules to
show enough performance gains to justify these costs (complexity,
maintainability, chances of bugs).
I think this rule should be disabled by default in 0.9.next and 0.10. In next
versions of pig, we can extract simpler rules from this one and have more
exhaustive test coverage before turning it on by default.
> Incorrect results for FILTER *** BY ( *** OR ***) with
> FilterLogicExpressionSimplifier optimizer turned on
> ----------------------------------------------------------------------------------------------------------
>
> Key: PIG-2316
> URL: https://issues.apache.org/jira/browse/PIG-2316
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.9.1
> Reporter: Huanyu Zhao
> Priority: Critical
> Fix For: 0.8.1, 0.9.2
>
> Attachments: pig-2316-trunk-v1.txt
>
>
> An example for this bug:
> cat weird.txt
> 1,a
> 2,b
> 3,c
> When running pig with the following statements:
> A = LOAD 'weird.txt' using PigStorage(',') AS (col1:int,col2);
> B = FILTER A BY ((col1==1) OR (col1 != 1));
> DUMP B;
> I expect to get the result of all three rows back, but I receive only two
> rows.
> (2,b)
> (3,c)
> When we start pig with optimizer turning off.
> pig -optimizer_off All
> With optimizer turning off, we get the expected results and I get three rows
> for the same statements.
> (1,a)
> (2,b)
> (3,c)
> --------------------------------------------------------
> This bug was test on:
> pig-0.9.1,
> pig-0.9.0,
> pig-0.8.1,
> pig-0.8.0
> All produced same incorrect results.
> --------------------------------------------------------
> When looked at the logical plan for this example, we found
> FilterlogicExpressionSimplifier optimizer produced incorrect logical plan. So
> we guess the bug is caused by FilterlogicExpressionSimplifier optimizer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira