[
https://issues.apache.org/jira/browse/TEZ-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478353#comment-17478353
]
inza9hi edited comment on TEZ-4377 at 1/19/22, 6:39 AM:
--------------------------------------------------------
[~abstractdog] For your notes:
# I will try that.
# It is already the simplest structure to reproduce the issue.
# I disabled both hive.vectorized.execution.reduce.enabled and
hive.vectorized.execution.enabled, the issue is still there.
If I remove the inner join, the result is right, and I have compared the two
explain plan, the explain is right.
was (Author: inza9hi):
[~abstractdog] For your notes:
# I will try that.
# It is already the simplest structure to reproduce the issue.
# I disabled both hive.vectorized.execution.reduce.enabled and
hive.vectorized.execution.enabled, the issue is still there.
If I remove the inner join, the result is right.
> Got wrong result when using inner join and left join and window function
> ------------------------------------------------------------------------
>
> Key: TEZ-4377
> URL: https://issues.apache.org/jira/browse/TEZ-4377
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.9.1, 0.10.1
> Reporter: inza9hi
> Priority: Major
>
> I am using Hive 3.1.0 and Tez 0.9.1.
> The sql looks just like this:
>
> {code:java}
> select
> count(innerCode),
> count(esdate)
> from
> (
> select t1.secuCode
> ,t1.innercode
> from
> (select innerCode
> ,secuCode
> ,cate
> from tmp.test_a1
> where cate in (8,13)
> ) t1
> inner join
> (
> select secuCode
> from tmp.test_a2
> where type not in ('pb','pbf','licai')
> ) t2
> on t1.secuCode = t2.secuCode
> ) a
> left join
> (
> select innercode
> ,esdate
>
> from tmp.test_b
> ) e on a.innerCode = e.innercode
> left join
> (
> select innerCode
> ,1 as if_gold_prize
> from tmp.test_c
> where winnertype = 1
> group by innerCode
> ) f1 on a.innerCode = f1.innerCode
> left join
> (
> select innercode
> ,endate enddate1
> from
> (
> select *
> ,row_number() over (partition by innercode order by endate
> desc) max_date
> from tmp.test_d
> ) t
> where max_date = 1
> ) m on a.innerCode = m.innercode
> left join
> (
> select innercode
> ,endate enddate2
> from
> (
> select *
> ,row_number() over (partition by innercode order by endate
> desc) max_date
> from tmp.test_e
> ) t
> where max_date = 1
> ) n on a.innerCode = n.innercode {code}
> The num of the records is right, but ** in the result there are lots of null
> *esdate* ( from the first left join) which {*}should not be null{*}.
> Sorry I can not reproduce the issue with small data sets.
> But something I can give:
> # Most of the tables are very small (about 20MB)
> # Either I remove the inner join or remove any left join, the result is
> right.
> # Disable the map join and cbo did not work either (
> --hiveconf hive.auto.convert.join.noconditionaltask =false --hiveconf
> hive.auto.convert.join=false --hiveconf hive.cbo.enable=false)
> # I tried the latest tez release (0.10.1), there is still the issue
> The sql works well on MR, so i think it is a Tez issue. I am not very
> familiar with Tez. Any suggestion to find the root cause?
> Thank you in advance.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)