[ 
https://issues.apache.org/jira/browse/TEZ-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

inza9hi updated TEZ-4377:
-------------------------
    Description: 
I am using Hive 3.1.0 and Tez 0.9.1.

The sql looks just like this:

 
{code:java}
select 
count(innerCode),
count(esdate)
from 
(
  select t1.secuCode
        ,t1.innercode
    from
        (select innerCode
            ,secuCode
            ,cate
        from tmp.test_a1
        where  cate in (8,13)
        ) t1
    inner join
        (
        select secuCode
        from tmp.test_a2
        where type not in ('pb','pbf','licai')
        ) t2
    on t1.secuCode = t2.secuCode
) a 

left join 
    (
    select innercode
        ,esdate
    
    from tmp.test_b
    ) e on a.innerCode = e.innercode

left join 
  (
    select innerCode
        ,1 as if_gold_prize
    from tmp.test_c
    where winnertype = 1
    group by innerCode
  ) f1 on a.innerCode = f1.innerCode


left join 
      (
      select innercode
          ,endate enddate1
      from
         (
          select *
              ,row_number() over (partition by innercode order by endate desc) 
max_date
          from tmp.test_d
         ) t
      where max_date = 1
      ) m on a.innerCode = m.innercode

left join 
      (
      select innercode
          ,endate enddate2
      from
         (
          select *
              ,row_number() over (partition by innercode order by endate desc) 
max_date
          from tmp.test_e
          ) t
      where max_date = 1
      ) n on a.innerCode = n.innercode {code}
The num of the records is right, but  there are lots of null *esdate* ( from 
the first left join) which should not be null. 

Sorry I can not reproduce the issue with small data sets.

But something I can give:
 # Most of the tables are very small (about 20MB)
 # Either I remove the inner join or remove any left join, the result is right.
 # Disable the map join and cbo did not work either (
--hiveconf hive.auto.convert.join.noconditionaltask =false --hiveconf 
hive.auto.convert.join=false --hiveconf hive.cbo.enable=false)
 # I tried the latest tez release (0.10.1), there is still the issue

 

The sql works well on MR, so i think it is a Tez issue. I am not very familiar 
with Tez. Any suggestion to find the root cause?

Thank you in advance.

  was:
I am using Hive 3.1.0 and Tez 0.9.1.

The sql looks just like this:

 
{code:java}
select 
count(innerCode),
count(esdate)
from 
(
  select t1.secuCode
        ,t1.innercode
    from
        (select innerCode
            ,secuCode
            ,cate
        from tmp.test_a1
        where  cate in (8,13)
        ) t1
    inner join
        (
        select secuCode
        from tmp.test_a2
        where type not in ('pb','pbf','licai')
        ) t2
    on t1.secuCode = t2.secuCode
) a 

left join 
    (
    select innercode
        ,esdate
    
    from tmp.test_b
    ) e on a.innerCode = e.innercode

left join 
  (
    select innerCode
        ,1 as if_gold_prize
    from tmp.test_c
    where winnertype = 1
    group by innerCode
  ) f1 on a.innerCode = f1.innerCode


left join 
      (
      select innercode
          ,endate enddate1
      from
         (
          select *
              ,row_number() over (partition by innercode order by endate desc) 
max_date
          from tmp.test_d
         ) t
      where max_date = 1
      ) m on a.innerCode = m.innercode

left join 
      (
      select innercode
          ,endate enddate2
      from
         (
          select *
              ,row_number() over (partition by innercode order by endate desc) 
max_date
          from tmp.test_e
          ) t
      where max_date = 1
      ) n on a.innerCode = n.innercode {code}
The num of the records is right, but  there are lots of null *esdate* ( from 
the first join) which should not be null. 

Sorry I can not reproduce the issue with small data sets.

But something I can give:
 # Most of the tables are very small (about 20MB)
 # Either I remove the inner join or remove any left join, the result is right.
 # Disable the map join and cbo did not work either (
--hiveconf hive.auto.convert.join.noconditionaltask =false --hiveconf 
hive.auto.convert.join=false --hiveconf hive.cbo.enable=false)
 # I tried the latest tez release (0.10.1), there is still the issue

 

The sql works well on MR, so i think it is a Tez issue. I am not very familiar 
with Tez. Any suggestion to find the root cause?

Thank you in advance.


> Got wrong result when using inner join and left join and window function
> ------------------------------------------------------------------------
>
>                 Key: TEZ-4377
>                 URL: https://issues.apache.org/jira/browse/TEZ-4377
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.1, 0.10.1
>            Reporter: inza9hi
>            Priority: Major
>
> I am using Hive 3.1.0 and Tez 0.9.1.
> The sql looks just like this:
>  
> {code:java}
> select 
> count(innerCode),
> count(esdate)
> from 
> (
>   select t1.secuCode
>         ,t1.innercode
>     from
>         (select innerCode
>             ,secuCode
>             ,cate
>         from tmp.test_a1
>         where  cate in (8,13)
>         ) t1
>     inner join
>         (
>         select secuCode
>         from tmp.test_a2
>         where type not in ('pb','pbf','licai')
>         ) t2
>     on t1.secuCode = t2.secuCode
> ) a 
> left join 
>     (
>     select innercode
>         ,esdate
>     
>     from tmp.test_b
>     ) e on a.innerCode = e.innercode
> left join 
>   (
>     select innerCode
>         ,1 as if_gold_prize
>     from tmp.test_c
>     where winnertype = 1
>     group by innerCode
>   ) f1 on a.innerCode = f1.innerCode
> left join 
>       (
>       select innercode
>           ,endate enddate1
>       from
>          (
>           select *
>               ,row_number() over (partition by innercode order by endate 
> desc) max_date
>           from tmp.test_d
>          ) t
>       where max_date = 1
>       ) m on a.innerCode = m.innercode
> left join 
>       (
>       select innercode
>           ,endate enddate2
>       from
>          (
>           select *
>               ,row_number() over (partition by innercode order by endate 
> desc) max_date
>           from tmp.test_e
>           ) t
>       where max_date = 1
>       ) n on a.innerCode = n.innercode {code}
> The num of the records is right, but  there are lots of null *esdate* ( from 
> the first left join) which should not be null. 
> Sorry I can not reproduce the issue with small data sets.
> But something I can give:
>  # Most of the tables are very small (about 20MB)
>  # Either I remove the inner join or remove any left join, the result is 
> right.
>  # Disable the map join and cbo did not work either (
> --hiveconf hive.auto.convert.join.noconditionaltask =false --hiveconf 
> hive.auto.convert.join=false --hiveconf hive.cbo.enable=false)
>  # I tried the latest tez release (0.10.1), there is still the issue
>  
> The sql works well on MR, so i think it is a Tez issue. I am not very 
> familiar with Tez. Any suggestion to find the root cause?
> Thank you in advance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to