Aihua Xu created HIVE-12762: ------------------------------- Summary: Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true Key: HIVE-12762 URL: https://issues.apache.org/jira/browse/HIVE-12762 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 2.1.0 Reporter: Aihua Xu Assignee: Aihua Xu
The following query will give incorrect result. {noformat} CREATE TABLE tbl1(id INT) STORED AS PARQUET; INSERT INTO tbl1 VALUES(1), (2); CREATE TABLE tbl2(id INT, value STRING) STORED AS PARQUET; INSERT INTO tbl2 VALUES(1, 'value1'); INSERT INTO tbl2 VALUES(1, 'value2'); set hive.optimize.index.filter = true; set hive.auto.convert.join=false; select tbl1.id, t1.value, t2.value FROM tbl1 JOIN (SELECT * FROM tbl2 WHERE value='value1') t1 ON tbl1.id=t1.id JOIN (SELECT * FROM tbl2 WHERE value='value2') t2 ON tbl1.id=t2.id; {noformat} We are enforcing to use common join and tbl2 will have 2 files after 2 insertions underneath. the map job contains 3 TableScan operators (2 for tbl2 and 1 for tbl1). When hive.optimize.index.filter is set to true, we are incorrectly applying the later filtering condition to each block, which causes no data is returned for the subquery {{SELECT * FROM tbl2 WHERE value='value1'}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)