dima machlin created HIVE-5964: ---------------------------------- Summary: Hive missing a filter predicate causing wrong results joining tables after sort by Key: HIVE-5964 URL: https://issues.apache.org/jira/browse/HIVE-5964 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.10.0 Reporter: dima machlin
It seems like the optimization of predicate pushdown is failing under certain conditions causing wrong results as a filter predicate appears to be completely disregarded by the query processor for some reason. Here is the scenario (assuming "dual" table exists) : set hive.optimize.ppd=true; drop table if exists test_tbl ; create table test_tbl (id string,name string); insert into table test_tbl select 'a','b' from dual; test_tbl now contains : a b the following query : select t2.* from (select id,name from (select id,name from test_tbl) t1 sort by id) t2 join test_tbl t3 on (t2.id=t3.id ) where t2.name='c' and t3.id='a'; returns : a b The filter :" t2.name='c' " is missing from the execution plan and obviously doesn't apply. The filter "t3.id='a' " does appear in the plan and is being applied before the join. If the query changes a little bit like removing the sort by, removing the t1 sub-query or disabling hive.optimize.ppd then the predicate appears. I'm able to reproduce the problem both in Hive 0.10 and Hive 0.11 although It seems to work fine in Hive 0.7 -- This message was sent by Atlassian JIRA (v6.1#6144)