Aman Sinha created IMPALA-10697:
-----------------------------------
Summary: NDV for rank() expression is incorrect
Key: IMPALA-10697
URL: https://issues.apache.org/jira/browse/IMPALA-10697
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Aman Sinha
In the following query the cardinality of the final Aggregate is always 1
regardless of the cardinality of its child. This is because the NDV of the
analytic expr such as RANK seems to always be computed as 1 which is incorrect.
{noformat}
Query: explain select rnk, count(*) from (
select * from
(SELECT rank() OVER (ORDER BY ss_net_profit ASC) rnk
FROM store_sales ss1
WHERE ss_store_sk = 4) v1
where rnk < 1000) v2
group by rnk
+------------------------------------------------------------------------------------------+
| Explain String
|
+------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=13.94MB Threads=3
|
| Per-Host Resource Estimates: Memory=142MB
|
| Analyzed query: SELECT rnk, count(*) FROM (SELECT * FROM (SELECT rank() OVER
|
| (ORDER BY ss_net_profit ASC) rnk FROM tpcds.store_sales ss1 WHERE ss_store_sk
= |
| CAST(4 AS INT)) v1 WHERE rnk < CAST(1000 AS BIGINT)) v2 GROUP BY rnk
|
|
|
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
| | Per-Host Resources: mem-estimate=14.01MB mem-reservation=5.94MB
thread-reservation=1 |
| PLAN-ROOT SINK
|
| | output exprs: rnk, count(*)
|
| | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
thread-reservation=0 |
| |
|
| 04:AGGREGATE [FINALIZE]
|
| | output: count(*)
|
| | group by: rank()
|
| | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB
thread-reservation=0 |
| | tuple-ids=5 row-size=16B cardinality=1
|
| | in pipelines: 04(GETNEXT), 06(OPEN)
|
| |
|
| 03:SELECT
|
| | predicates: rank() < CAST(1000 AS BIGINT)
|
| | mem-estimate=0B mem-reservation=0B thread-reservation=0
|
| | tuple-ids=8,7 row-size=16B cardinality=999
|
| | in pipelines: 06(GETNEXT)
|
| |
|
| 02:ANALYTIC
|
| | functions: rank()
|
| | order by: ss_net_profit ASC
|
| | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
| | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
thread-reservation=0 |
| | tuple-ids=8,7 row-size=16B cardinality=999
|
| | in pipelines: 06(GETNEXT)
|
| |
|
| 06:TOP-N
|
| | order by: ss_net_profit ASC
|
| | limit with ties: 999
|
| | mem-estimate=7.80KB mem-reservation=0B thread-reservation=0
|
| | tuple-ids=8 row-size=8B cardinality=999
|
| | in pipelines: 06(GETNEXT), 01(OPEN)
|
| |
|
| 05:EXCHANGE [UNPARTITIONED]
|
| | mem-estimate=37.72KB mem-reservation=0B thread-reservation=0
|
| | tuple-ids=8 row-size=8B cardinality=999
|
| | in pipelines: 01(GETNEXT)
|
| |
|
| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
|
| Per-Host Resources: mem-estimate=128.01MB mem-reservation=8.00MB
thread-reservation=2 |
| 01:TOP-N
|
| | order by: ss_net_profit ASC
|
| | limit with ties: 999
|
| | source expr: rank() < CAST(1000 AS BIGINT)
|
| | mem-estimate=7.80KB mem-reservation=0B thread-reservation=0
|
| | tuple-ids=8 row-size=8B cardinality=999
|
| | in pipelines: 01(GETNEXT), 00(OPEN)
|
| |
|
| 00:SCAN HDFS [tpcds.store_sales ss1, RANDOM]
|
| HDFS partitions=1824/1824 files=1824 size=346.60MB
|
| predicates: ss_store_sk = CAST(4 AS INT)
|
| stored statistics:
|
| table: rows=2.88M size=346.60MB
|
| partitions: 1824/1824 rows=2.88M
|
| columns: all
|
| extrapolated-rows=disabled max-scan-range-rows=130.09K
|
| mem-estimate=128.00MB mem-reservation=8.00MB thread-reservation=1
|
| tuple-ids=0 row-size=8B cardinality=480.07K
|
| in pipelines: 00(GETNEXT)
|
+------------------------------------------------------------------------------------------+
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]