[ 
https://issues.apache.org/jira/browse/TEZ-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478353#comment-17478353
 ] 

inza9hi edited comment on TEZ-4377 at 1/19/22, 6:39 AM:
--------------------------------------------------------

[~abstractdog] For your notes:
 # I will try that.
 # It is already the simplest structure to reproduce the issue.
 # I disabled both hive.vectorized.execution.reduce.enabled and 
hive.vectorized.execution.enabled, the issue is still there.

If I remove the inner join, the result is right, and I have compared the two 
explain plan, the explain is right.


was (Author: inza9hi):
[~abstractdog] For your notes:
 # I will try that.
 # It is already the simplest structure to reproduce the issue.
 # I disabled both hive.vectorized.execution.reduce.enabled and 
hive.vectorized.execution.enabled, the issue is still there.

If I remove the inner join, the result is right.

> Got wrong result when using inner join and left join and window function
> ------------------------------------------------------------------------
>
>                 Key: TEZ-4377
>                 URL: https://issues.apache.org/jira/browse/TEZ-4377
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.1, 0.10.1
>            Reporter: inza9hi
>            Priority: Major
>
> I am using Hive 3.1.0 and Tez 0.9.1.
> The sql looks just like this:
>  
> {code:java}
> select 
> count(innerCode),
> count(esdate)
> from 
> (
>   select t1.secuCode
>         ,t1.innercode
>     from
>         (select innerCode
>             ,secuCode
>             ,cate
>         from tmp.test_a1
>         where  cate in (8,13)
>         ) t1
>     inner join
>         (
>         select secuCode
>         from tmp.test_a2
>         where type not in ('pb','pbf','licai')
>         ) t2
>     on t1.secuCode = t2.secuCode
> ) a 
> left join 
>     (
>     select innercode
>         ,esdate
>     
>     from tmp.test_b
>     ) e on a.innerCode = e.innercode
> left join 
>   (
>     select innerCode
>         ,1 as if_gold_prize
>     from tmp.test_c
>     where winnertype = 1
>     group by innerCode
>   ) f1 on a.innerCode = f1.innerCode
> left join 
>       (
>       select innercode
>           ,endate enddate1
>       from
>          (
>           select *
>               ,row_number() over (partition by innercode order by endate 
> desc) max_date
>           from tmp.test_d
>          ) t
>       where max_date = 1
>       ) m on a.innerCode = m.innercode
> left join 
>       (
>       select innercode
>           ,endate enddate2
>       from
>          (
>           select *
>               ,row_number() over (partition by innercode order by endate 
> desc) max_date
>           from tmp.test_e
>           ) t
>       where max_date = 1
>       ) n on a.innerCode = n.innercode {code}
> The num of the records is right, but  ** in the result there are lots of null 
> *esdate* ( from the first left join) which {*}should not be null{*}.
> Sorry I can not reproduce the issue with small data sets.
> But something I can give:
>  # Most of the tables are very small (about 20MB)
>  # Either I remove the inner join or remove any left join, the result is 
> right.
>  # Disable the map join and cbo did not work either (
> --hiveconf hive.auto.convert.join.noconditionaltask =false --hiveconf 
> hive.auto.convert.join=false --hiveconf hive.cbo.enable=false)
>  # I tried the latest tez release (0.10.1), there is still the issue
> The sql works well on MR, so i think it is a Tez issue. I am not very 
> familiar with Tez. Any suggestion to find the root cause?
> Thank you in advance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to