[jira] [Created] (IMPALA-10697) NDV for rank() expression is incorrect

Aman Sinha (Jira) Sun, 09 May 2021 20:21:06 -0700

Aman Sinha created IMPALA-10697:
-----------------------------------

             Summary: NDV for rank() expression is incorrect
                 Key: IMPALA-10697
                 URL: https://issues.apache.org/jira/browse/IMPALA-10697
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Aman Sinha



In the following query the cardinality of the final Aggregate is always 1 
regardless of the cardinality of its child.  This is because the NDV of the 
analytic expr such as RANK seems to always be computed as 1 which is incorrect. 
{noformat}

Query: explain select rnk, count(*) from (
select * from
 (SELECT rank() OVER (ORDER BY ss_net_profit ASC) rnk
    FROM store_sales ss1
    WHERE ss_store_sk = 4) v1
where rnk < 1000) v2
group by rnk
+------------------------------------------------------------------------------------------+
| Explain String                                                                
           |
+------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=13.94MB Threads=3                   
           |
| Per-Host Resource Estimates: Memory=142MB                                     
           |
| Analyzed query: SELECT rnk, count(*) FROM (SELECT * FROM (SELECT rank() OVER  
           |
| (ORDER BY ss_net_profit ASC) rnk FROM tpcds.store_sales ss1 WHERE ss_store_sk 
=          |
| CAST(4 AS INT)) v1 WHERE rnk < CAST(1000 AS BIGINT)) v2 GROUP BY rnk          
           |
|                                                                               
           |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
           |
| |  Per-Host Resources: mem-estimate=14.01MB mem-reservation=5.94MB 
thread-reservation=1  |
| PLAN-ROOT SINK                                                                
           |
| |  output exprs: rnk, count(*)                                                
           |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0   |
| |                                                                             
           |
| 04:AGGREGATE [FINALIZE]                                                       
           |
| |  output: count(*)                                                           
           |
| |  group by: rank()                                                           
           |
| |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=5 row-size=16B cardinality=1                                     
           |
| |  in pipelines: 04(GETNEXT), 06(OPEN)                                        
           |
| |                                                                             
           |
| 03:SELECT                                                                     
           |
| |  predicates: rank() < CAST(1000 AS BIGINT)                                  
           |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
           |
| |  tuple-ids=8,7 row-size=16B cardinality=999                                 
           |
| |  in pipelines: 06(GETNEXT)                                                  
           |
| |                                                                             
           |
| 02:ANALYTIC                                                                   
           |
| |  functions: rank()                                                          
           |
| |  order by: ss_net_profit ASC                                                
           |
| |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                  
           |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0   |
| |  tuple-ids=8,7 row-size=16B cardinality=999                                 
           |
| |  in pipelines: 06(GETNEXT)                                                  
           |
| |                                                                             
           |
| 06:TOP-N                                                                      
           |
| |  order by: ss_net_profit ASC                                                
           |
| |  limit with ties: 999                                                       
           |
| |  mem-estimate=7.80KB mem-reservation=0B thread-reservation=0                
           |
| |  tuple-ids=8 row-size=8B cardinality=999                                    
           |
| |  in pipelines: 06(GETNEXT), 01(OPEN)                                        
           |
| |                                                                             
           |
| 05:EXCHANGE [UNPARTITIONED]                                                   
           |
| |  mem-estimate=37.72KB mem-reservation=0B thread-reservation=0               
           |
| |  tuple-ids=8 row-size=8B cardinality=999                                    
           |
| |  in pipelines: 01(GETNEXT)                                                  
           |
| |                                                                             
           |
| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                
           |
| Per-Host Resources: mem-estimate=128.01MB mem-reservation=8.00MB 
thread-reservation=2    |
| 01:TOP-N                                                                      
           |
| |  order by: ss_net_profit ASC                                                
           |
| |  limit with ties: 999                                                       
           |
| |  source expr: rank() < CAST(1000 AS BIGINT)                                 
           |
| |  mem-estimate=7.80KB mem-reservation=0B thread-reservation=0                
           |
| |  tuple-ids=8 row-size=8B cardinality=999                                    
           |
| |  in pipelines: 01(GETNEXT), 00(OPEN)                                        
           |
| |                                                                             
           |
| 00:SCAN HDFS [tpcds.store_sales ss1, RANDOM]                                  
           |
|    HDFS partitions=1824/1824 files=1824 size=346.60MB                         
           |
|    predicates: ss_store_sk = CAST(4 AS INT)                                   
           |
|    stored statistics:                                                         
           |
|      table: rows=2.88M size=346.60MB                                          
           |
|      partitions: 1824/1824 rows=2.88M                                         
           |
|      columns: all                                                             
           |
|    extrapolated-rows=disabled max-scan-range-rows=130.09K                     
           |
|    mem-estimate=128.00MB mem-reservation=8.00MB thread-reservation=1          
           |
|    tuple-ids=0 row-size=8B cardinality=480.07K                                
           |
|    in pipelines: 00(GETNEXT)                                                  
           |
+------------------------------------------------------------------------------------------+
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (IMPALA-10697) NDV for rank() expression is incorrect

Reply via email to