[
https://issues.apache.org/jira/browse/TEZ-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
inza9hi updated TEZ-4377:
-------------------------
Affects Version/s: 0.10.1
0.9.1
> Got wrong result when using inner join and left join and window function
> ------------------------------------------------------------------------
>
> Key: TEZ-4377
> URL: https://issues.apache.org/jira/browse/TEZ-4377
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.9.1, 0.10.1
> Reporter: inza9hi
> Priority: Major
>
> I am using Hive 3.1.0 and Tez 0.9.1.
> The sql looks just like this:
>
> {code:java}
> select
> count(innerCode),
> count(esdate)
> from
> (
> select t1.secuCode
> ,t1.innercode
> from
> (select innerCode
> ,secuCode
> ,cate
> from tmp.test_a1
> where cate in (8,13)
> ) t1
> inner join
> (
> select secuCode
> from tmp.test_a2
> where type not in ('pb','pbf','licai')
> ) t2
> on t1.secuCode = t2.secuCode
> ) a
> left join
> (
> select innercode
> ,esdate
>
> from tmp.test_b
> ) e on a.innerCode = e.innercode
> left join
> (
> select innerCode
> ,1 as if_gold_prize
> from tmp.test_c
> where winnertype = 1
> group by innerCode
> ) f1 on a.innerCode = f1.innerCode
> left join
> (
> select innercode
> ,endate enddate1
> from
> (
> select *
> ,row_number() over (partition by innercode order by endate
> desc) max_date
> from tmp.test_d
> ) t
> where max_date = 1
> ) m on a.innerCode = m.innercode
> left join
> (
> select innercode
> ,endate enddate2
> from
> (
> select *
> ,row_number() over (partition by innercode order by endate
> desc) max_date
> from tmp.test_e
> ) t
> where max_date = 1
> ) n on a.innerCode = n.innercode {code}
> The num of the records is right, but there are lots of null *esdate* ( from
> the first join) which should not be null.
> Sorry I can not reproduce the issue with small data sets.
> But something I can give:
> # Most of the tables are very small (about 20MB)
> # Either I remove the inner join or remove any left join, the result is
> right.
> # Disable the map join and cbo did not work either (
> --hiveconf hive.auto.convert.join.noconditionaltask =false --hiveconf
> hive.auto.convert.join=false --hiveconf hive.cbo.enable=false)
> # Disable the sql works well on MR, so is it a Tez issue?
> # I tried the latest tez release (0.10.1), there is still the issue
>
>
> Sorry it can not reproduces in
--
This message was sent by Atlassian Jira
(v8.20.1#820001)