Aman Sinha created IMPALA-11030:
-----------------------------------
Summary: Wrong result due to predicate pushdown into inline view
with Analytic function
Key: IMPALA-11030
URL: https://issues.apache.org/jira/browse/IMPALA-11030
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Aman Sinha
Assignee: Aman Sinha
Table DDL and population:
{noformat}
create table t1( c1 int, c2 char(1));
insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as
char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as
char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as
char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P' as
char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N' as
char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17, cast('P' as
char(1))),(18, cast('N' as char(1))),(19, cast('P' as char(1))),(20, cast('N'
as char(1)));
{noformat}
{noformat}
default> select * from t1;
------+
c1 c2
------+
11 P
12 N
13 P
14 N
15 N
16 N
17 P
18 N
19 P
20 N
1 P
2 P
3 P
4 N
5 P
6 N
7 P
8 N
9 N
10 N
------+
The following query produces a wrong num_row() for num_ranks column.
default> select * from (select c1, c2 , row_number() over(order by c1) as
num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from
t1) a where c2='P';
-------------------------+
c1 c2 num_ranks prime_rank
-------------------------+
1 P 1 1
2 P 2 2
3 P 3 3
5 P 4 4
7 P 5 5
11 P 6 6
13 P 7 7
17 P 8 8
19 P 9 9
-------------------------+
{noformat}
The plan indicates that the predicate c2='P' is incorrectly pushed to the scan
and affects the order of operations in the SQL statement.
{noformat}
Query: explain select * from (select c1, c2 , row_number() over(order by c1) as
num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from
t1) a where c2='P'
+------------------------------------------------------------------------------------------+
| Explain String
|
+------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=20.00MB Threads=2
|
| Per-Host Resource Estimates: Memory=30MB
|
| Codegen disabled by planner
|
| Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY c1
|
| ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC)
prime_rank |
| FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P'
|
|
|
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
| | Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB
thread-reservation=2 |
| PLAN-ROOT SINK
|
| | output exprs: c1, c2, row_number(), row_number()
|
| | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
thread-reservation=0 |
| |
|
| 04:ANALYTIC
|
| | functions: row_number()
|
| | order by: c1 ASC
|
| | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
| | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
thread-reservation=0 |
| | tuple-ids=7,3 row-size=21B cardinality=10
|
| | in pipelines: 03(GETNEXT)
|
| |
|
| 03:SORT
|
| | order by: c1 ASC
|
| | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
thread-reservation=0 |
| | tuple-ids=7 row-size=13B cardinality=10
|
| | in pipelines: 03(GETNEXT), 01(OPEN)
|
| |
|
| 02:ANALYTIC
|
| | functions: row_number()
|
| | partition by: c2
|
| | order by: c1 ASC
|
| | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
| | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
thread-reservation=0 |
| | tuple-ids=5,4 row-size=13B cardinality=10
|
| | in pipelines: 01(GETNEXT)
|
| |
|
| 01:SORT
|
| | order by: c2 ASC NULLS LAST, c1 ASC
|
| | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
thread-reservation=0 |
| | tuple-ids=5 row-size=5B cardinality=10
|
| | in pipelines: 01(GETNEXT), 00(OPEN)
|
| |
|
| 00:SCAN HDFS [default.t1]
|
| HDFS partitions=1/1 files=2 size=91B
|
| predicates: CAST(default.t1.c2 AS STRING) = 'P'
|
| stored statistics:
|
| table: rows=20 size=91B
|
| columns: all
|
| extrapolated-rows=disabled max-scan-range-rows=10
|
| mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1
|
| tuple-ids=0 row-size=5B cardinality=10
|
| in pipelines: 00(GETNEXT)
|
+------------------------------------------------------------------------------------------+
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]