Ted Xu created HIVE-11576:
-----------------------------
Summary: Data loss in MapJoin
Key: HIVE-11576
URL: https://issues.apache.org/jira/browse/HIVE-11576
Project: Hive
Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Ted Xu
Assignee: Matt McCline
In query (TPC-H query4)
{code:title=query4.sql|borderStyle=solid}
create table q4_result as
select
o_orderpriority,
count(*) as order_count
from
orders o
join
(
select
distinct l_orderkey
from
(
select
*
from
lineitem
where
l_commitdate < l_receiptdate
) tab1
) tab2
on tab2.l_orderkey = o.o_orderkey
where
o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01'
group by
o_orderpriority
order by
o_orderpriority;
{code}
The query will cause data-loss if MapJoin is enabled. Both side of join have
expected output but some data can't be joined together here. After disabling
auto convert join, the problem is gone.
Context:
l_orderkey & o_orderkey are bigint.
vectorized execution enabled.
execution engine is tez.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)