Ted Xu created HIVE-11576:
-----------------------------

             Summary: Data loss in MapJoin
                 Key: HIVE-11576
                 URL: https://issues.apache.org/jira/browse/HIVE-11576
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.2.0
            Reporter: Ted Xu
            Assignee: Matt McCline


In query (TPC-H query4)

{code:title=query4.sql|borderStyle=solid}
create table q4_result as 
select 
o_orderpriority, 
count(*) as order_count 
from 
orders o 
join 
( 
select 
distinct l_orderkey 
from 
( 
select 
* 
from 
lineitem 
where 
l_commitdate < l_receiptdate 
) tab1 
) tab2 
on tab2.l_orderkey = o.o_orderkey 
where 
o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01' 
group by 
o_orderpriority 
order by 
o_orderpriority;
{code}

The query will cause data-loss if MapJoin is enabled. Both side of join have 
expected output but some data can't be joined together here. After disabling 
auto convert join, the problem is gone.

Context:
l_orderkey & o_orderkey are bigint.
vectorized execution enabled.
execution engine is tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to