Aihua Xu created HIVE-12762:
-------------------------------

             Summary: Common join on parquet tables returns incorrect result 
when hive.optimize.index.filter set to true
                 Key: HIVE-12762
                 URL: https://issues.apache.org/jira/browse/HIVE-12762
             Project: Hive
          Issue Type: Bug
          Components: Logical Optimizer
    Affects Versions: 2.1.0
            Reporter: Aihua Xu
            Assignee: Aihua Xu


The following query will give incorrect result.
{noformat}
CREATE TABLE tbl1(id INT) STORED AS PARQUET;
INSERT INTO tbl1 VALUES(1), (2);

CREATE TABLE tbl2(id INT, value STRING) STORED AS PARQUET;
INSERT INTO tbl2 VALUES(1, 'value1');
INSERT INTO tbl2 VALUES(1, 'value2');

set hive.optimize.index.filter = true;
set hive.auto.convert.join=false;
select tbl1.id, t1.value, t2.value
FROM tbl1
JOIN (SELECT * FROM tbl2 WHERE value='value1') t1 ON tbl1.id=t1.id
JOIN (SELECT * FROM tbl2 WHERE value='value2') t2 ON tbl1.id=t2.id;
{noformat}

We are enforcing to use common join and tbl2 will have 2 files after 2 
insertions underneath.

the map job contains 3 TableScan operators (2 for tbl2 and 1 for tbl1). When    
hive.optimize.index.filter is set to true, we are incorrectly applying the 
later filtering condition to each block, which causes no data is returned for 
the subquery {{SELECT * FROM tbl2 WHERE value='value1'}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to