[ 
https://issues.apache.org/jira/browse/TEZ-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

inza9hi updated TEZ-4377:
-------------------------
    Fix Version/s: 0.9.1

> Got wrong result when using inner join and left join and window function
> ------------------------------------------------------------------------
>
>                 Key: TEZ-4377
>                 URL: https://issues.apache.org/jira/browse/TEZ-4377
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: inza9hi
>            Priority: Major
>             Fix For: 0.9.1
>
>
> I am using Hive 3.1.0 and Tez 0.9.1.
> The sql looks just like this:
>  
> {code:java}
> select 
> count(innerCode),
> count(esdate)
> from 
> (
>   select t1.secuCode
>         ,t1.innercode
>     from
>         (select innerCode
>             ,secuCode
>             ,cate
>         from tmp.test_a1
>         where  cate in (8,13)
>         ) t1
>     inner join
>         (
>         select secuCode
>         from tmp.test_a2
>         where type not in ('pb','pbf','licai')
>         ) t2
>     on t1.secuCode = t2.secuCode
> ) a 
> left join 
>     (
>     select innercode
>         ,esdate
>     
>     from tmp.test_b
>     ) e on a.innerCode = e.innercode
> left join 
>   (
>     select innerCode
>         ,1 as if_gold_prize
>     from tmp.test_c
>     where winnertype = 1
>     group by innerCode
>   ) f1 on a.innerCode = f1.innerCode
> left join 
>       (
>       select innercode
>           ,endate enddate1
>       from
>          (
>           select *
>               ,row_number() over (partition by innercode order by endate 
> desc) max_date
>           from tmp.test_d
>          ) t
>       where max_date = 1
>       ) m on a.innerCode = m.innercode
> left join 
>       (
>       select innercode
>           ,endate enddate2
>       from
>          (
>           select *
>               ,row_number() over (partition by innercode order by endate 
> desc) max_date
>           from tmp.test_e
>           ) t
>       where max_date = 1
>       ) n on a.innerCode = n.innercode {code}
> The num of the records is right, but  there are lots of null *esdate* ( from 
> the first join) which should not be null. 
> Sorry I can not reproduce the issue with small data sets.
> But something I can give:
>  # Most of the tables are very small (about 20MB)
>  # Either I remove the inner join or remove any left join, the result is 
> right.
>  # Disable the map join and cbo did not work either (
> --hiveconf hive.auto.convert.join.noconditionaltask =false --hiveconf 
> hive.auto.convert.join=false --hiveconf hive.cbo.enable=false)
>  # Disable the sql works well on MR, so is it a Tez issue?
>  # I tried the latest tez release (0.10.1), there is still the issue
>  
>  
> Sorry it can not reproduces in



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to