Incorrect results for FILTER *** BY ( *** OR ***) with 
FilterLogicExpressionSimplifier optimizer turned on
----------------------------------------------------------------------------------------------------------

                 Key: PIG-2316
                 URL: https://issues.apache.org/jira/browse/PIG-2316
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.9.0, 0.8.1, 0.8.0, 0.9.1
            Reporter: Huanyu Zhao
            Priority: Critical


An example for this bug: 

cat weird.txt
1,a
2,b
3,c

When running pig with the following statements:

A = LOAD 'weird.txt' using PigStorage(',') AS (col1:int,col2);
B = FILTER A BY ((col1==1) OR (col1 != 1));
DUMP B;

I expect to get the result of all three rows back, but I receive only two rows.

(2,b)
(3,c)

When we start pig with optimizer turning off.

pig -optimizer_off All

With optimizer turning off, we get the expected results and I get three rows 
for the same statements.

(1,a)
(2,b)
(3,c)

--------------------------------------------------------

This bug was test on: 

pig-0.9.1, 
pig-0.9.0, 
pig-0.8.1, 
pig-0.8.0

All produced same incorrect results.

--------------------------------------------------------

When looked at the logical plan for this example, we found 
FilterlogicExpressionSimplifier optimizer produced incorrect logical plan. So 
we guess the bug is caused by FilterlogicExpressionSimplifier optimizer. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to