[jira] [Commented] (HIVE-11576) Data loss in MapJoin

Ted Xu (JIRA) Tue, 18 Aug 2015 19:38:57 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702338#comment-14702338
 ]


Ted Xu commented on HIVE-11576:
-------------------------------

Thanks [~gopalv] for looking into this.

I'm running on 1TB scale TPC-H. Note that I replaced all int schema with bigint.

> Data loss in MapJoin
> --------------------
>
>                 Key: HIVE-11576
>                 URL: https://issues.apache.org/jira/browse/HIVE-11576
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Ted Xu
>            Assignee: Matt McCline
>
> In query (TPC-H query4)
> {code:title=query4.sql|borderStyle=solid}
> create table q4_result as 
> select 
> o_orderpriority, 
> count(*) as order_count 
> from 
> orders o 
> join 
> ( 
> select 
> distinct l_orderkey 
> from 
> ( 
> select 
> * 
> from 
> lineitem 
> where 
> l_commitdate < l_receiptdate 
> ) tab1 
> ) tab2 
> on tab2.l_orderkey = o.o_orderkey 
> where 
> o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01' 
> group by 
> o_orderpriority 
> order by 
> o_orderpriority;
> {code}
> The query will cause data-loss if MapJoin is enabled. Both side of join have 
> expected output but some data can't be joined together here. After disabling 
> auto convert join, the problem is gone.
> Context:
> l_orderkey & o_orderkey are bigint.
> vectorized execution enabled.
> execution engine is tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11576) Data loss in MapJoin

Reply via email to