FengZhou created IMPALA-13428:
---------------------------------

             Summary: Using a left join with a window function results in 
incorrect outcomes.
                 Key: IMPALA-13428
                 URL: https://issues.apache.org/jira/browse/IMPALA-13428
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 4.1.2
            Reporter: FengZhou


When using a window function, and performing a left join, filtering based on 
the field used by the window function, inconsistencies can occur during the 
pushdown process.
{code:java}
create table if not exists test.a(
       a string
       , b string
);

insert into test.a
values ('a', '1')
       , ('b', '2')
;


create table if not exists test.b(
       a string
       , b string
       , c string
);

insert into test.b
values ('a', '1', '1')
       , ('b', '2', '2')
       , ('c', '3', '3')
       , ('c', '3', '4')
       , ('c', '3', '5')
;   


select *
from (SELECT  
              t2.b
      FROM test.a t1
 LEFT JOIN (SELECT a
          , ROW_NUMBER() OVER(PARTITION BY a ORDER BY b) b
         -- , count(1) b
         FROM test.b) T2
ON t1.a=t2.a
) t
where b = 10{code}
The correct result should have no records at all. However, in version 4.1.2, 
the result that was executed contains 4 records with *NULL* values.

Here is the execution plan.

 
{code:java}
PLAN-ROOT SINK
|
09:EXCHANGE [UNPARTITIONED]
|
05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
|  hash predicates: a = t1.a
|  row-size=44B cardinality=224
|
|--08:EXCHANGE [HASH(t1.a)]
|  |
|  00:SCAN HDFS [test.a t1]
|     HDFS partitions=1/1 files=2 size=1.06KB
|     row-size=12B cardinality=224
|
04:SELECT
|  predicates: row_number() = 10   <---- There seems to be a problem here. 
|  row-size=32B cardinality=218
|
03:ANALYTIC
|  functions: row_number()
|  partition by: a
|  order by: b ASC
|  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|  row-size=32B cardinality=218
|
07:TOP-N
|  partition by: a
|  order by: b ASC
|  partition limit: 10
|  row-size=24B cardinality=218
|
06:EXCHANGE [HASH(a)]
|
02:TOP-N
|  partition by: a
|  order by: b ASC
|  partition limit: 10
|  source expr: row_number() = CAST(10 AS BIGINT)
|  row-size=24B cardinality=218
|
01:SCAN HDFS [test.b]
   HDFS partitions=1/1 files=2 size=1.54KB
   row-size=24B cardinality=218{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to