[
https://issues.apache.org/jira/browse/IMPALA-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453829#comment-17453829
]
Aman Sinha commented on IMPALA-11030:
-------------------------------------
Gerrit review: https://gerrit.cloudera.org/c/18072/
> Wrong result due to predicate pushdown into inline view with Analytic function
> ------------------------------------------------------------------------------
>
> Key: IMPALA-11030
> URL: https://issues.apache.org/jira/browse/IMPALA-11030
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.4.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
>
> Table DDL and population:
> {noformat}
> create table t1( c1 int, c2 char(1));
> insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as
> char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as
> char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as
> char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P'
> as char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N'
> as char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17,
> cast('P' as char(1))),(18, cast('N' as char(1))),(19, cast('P' as
> char(1))),(20, cast('N' as char(1)));
> {noformat}
> {noformat}
> default> select * from t1;
> ------+
> c1 c2
> ------+
> 11 P
> 12 N
> 13 P
> 14 N
> 15 N
> 16 N
> 17 P
> 18 N
> 19 P
> 20 N
> 1 P
> 2 P
> 3 P
> 4 N
> 5 P
> 6 N
> 7 P
> 8 N
> 9 N
> 10 N
> ------+
> The following query produces a wrong num_row() for num_ranks column.
> default> select * from (select c1, c2 , row_number() over(order by c1) as
> num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from
> t1) a where c2='P';
> -------------------------+
> c1 c2 num_ranks prime_rank
> -------------------------+
> 1 P 1 1
> 2 P 2 2
> 3 P 3 3
> 5 P 4 4
> 7 P 5 5
> 11 P 6 6
> 13 P 7 7
> 17 P 8 8
> 19 P 9 9
> -------------------------+
> {noformat}
> The plan indicates that the predicate c2='P' is incorrectly pushed to the
> scan and affects the order of operations in the SQL statement.
> {noformat}
> Query: explain select * from (select c1, c2 , row_number() over(order by c1)
> as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank
> from t1) a where c2='P'
> +------------------------------------------------------------------------------------------+
> | Explain String
> |
> +------------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=20.00MB Threads=2
> |
> | Per-Host Resource Estimates: Memory=30MB
> |
> | Codegen disabled by planner
> |
> | Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY
> c1 |
> | ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC)
> prime_rank |
> | FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P'
> |
> |
> |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |
> | | Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB
> thread-reservation=2 |
> | PLAN-ROOT SINK
> |
> | | output exprs: c1, c2, row_number(), row_number()
> |
> | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
> thread-reservation=0 |
> | |
> |
> | 04:ANALYTIC
> |
> | | functions: row_number()
> |
> | | order by: c1 ASC
> |
> | | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> |
> | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
> thread-reservation=0 |
> | | tuple-ids=7,3 row-size=21B cardinality=10
> |
> | | in pipelines: 03(GETNEXT)
> |
> | |
> |
> | 03:SORT
> |
> | | order by: c1 ASC
> |
> | | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
> thread-reservation=0 |
> | | tuple-ids=7 row-size=13B cardinality=10
> |
> | | in pipelines: 03(GETNEXT), 01(OPEN)
> |
> | |
> |
> | 02:ANALYTIC
> |
> | | functions: row_number()
> |
> | | partition by: c2
> |
> | | order by: c1 ASC
> |
> | | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> |
> | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
> thread-reservation=0 |
> | | tuple-ids=5,4 row-size=13B cardinality=10
> |
> | | in pipelines: 01(GETNEXT)
> |
> | |
> |
> | 01:SORT
> |
> | | order by: c2 ASC NULLS LAST, c1 ASC
> |
> | | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
> thread-reservation=0 |
> | | tuple-ids=5 row-size=5B cardinality=10
> |
> | | in pipelines: 01(GETNEXT), 00(OPEN)
> |
> | |
> |
> | 00:SCAN HDFS [default.t1]
> |
> | HDFS partitions=1/1 files=2 size=91B
> |
> | predicates: CAST(default.t1.c2 AS STRING) = 'P'
> |
> | stored statistics:
> |
> | table: rows=20 size=91B
> |
> | columns: all
> |
> | extrapolated-rows=disabled max-scan-range-rows=10
> |
> | mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1
> |
> | tuple-ids=0 row-size=5B cardinality=10
> |
> | in pipelines: 00(GETNEXT)
> |
> +------------------------------------------------------------------------------------------+
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]