Ted Xu created HIVE-11576: ----------------------------- Summary: Data loss in MapJoin Key: HIVE-11576 URL: https://issues.apache.org/jira/browse/HIVE-11576 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Ted Xu Assignee: Matt McCline
In query (TPC-H query4) {code:title=query4.sql|borderStyle=solid} create table q4_result as select o_orderpriority, count(*) as order_count from orders o join ( select distinct l_orderkey from ( select * from lineitem where l_commitdate < l_receiptdate ) tab1 ) tab2 on tab2.l_orderkey = o.o_orderkey where o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01' group by o_orderpriority order by o_orderpriority; {code} The query will cause data-loss if MapJoin is enabled. Both side of join have expected output but some data can't be joined together here. After disabling auto convert join, the problem is gone. Context: l_orderkey & o_orderkey are bigint. vectorized execution enabled. execution engine is tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)