[ 
https://issues.apache.org/jira/browse/IMPALA-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-5851.
----------------------------------
    Fix Version/s: Impala 4.3.0
       Resolution: Duplicate

Resolving this as a duplicate of IMPALA-12395.

> Estimate number of rows for  sum_init_zero scans should be number of files 
> not table cardinality
> ------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-5851
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5851
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Mostafa Mokhtar
>            Priority: Minor
>             Fix For: Impala 4.3.0
>
>
> IMPALA-5036 introduced an optimization to use the data stored in the Parquet 
> RowGroup.num_rows field for count(*) queries.
> The estimate cardinality for the scan is the number of rows in the base table 
> opposed to number of files or row groups. 
> {code}
> +-------------------------------------------------------------------------------+
> | Explain String                                                              
>   |
> +-------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=0B                                
>   |
> | Per-Host Resource Estimates: Memory=108.00MB                                
>   |
> |                                                                             
>   |
> | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
>   |
> | |  Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B              
>   |
> | PLAN-ROOT SINK                                                              
>   |
> | |  mem-estimate=0B mem-reservation=0B                                       
>   |
> | |                                                                           
>   |
> | 03:AGGREGATE [FINALIZE]                                                     
>   |
> | |  output: count:merge(*)                                                   
>   |
> | |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB              
>   |
> | |  tuple-ids=1 row-size=8B cardinality=1                                    
>   |
> | |                                                                           
>   |
> | 02:EXCHANGE [UNPARTITIONED]                                                 
>   |
> | |  mem-estimate=0B mem-reservation=0B                                       
>   |
> | |  tuple-ids=1 row-size=8B cardinality=1                                    
>   |
> | |                                                                           
>   |
> | F00:PLAN FRAGMENT [RANDOM] hosts=130 instances=130                          
>   |
> | Per-Host Resources: mem-estimate=98.00MB mem-reservation=0B                 
>   |
> | 01:AGGREGATE                                                                
>   |
> | |  output: sum_init_zero(tpch_30000_parquet.lineitem.parquet-stats: 
> num_rows) |
> | |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB              
>   |
> | |  tuple-ids=1 row-size=8B cardinality=1                                    
>   |
> | |                                                                           
>   |
> | 00:SCAN HDFS [tpch_30000_parquet.lineitem, RANDOM]                          
>   |
> |    partitions=2526/2526 files=28976 size=6.89TB                             
>   |
> |    stats-rows=179999978268 extrapolated-rows=disabled                       
>   |
> |    table stats: rows=179999978268 size=unavailable                          
>   |
> |    column stats: all                                                        
>   |
> |    mem-estimate=88.00MB mem-reservation=0B                                  
>   |
> |    tuple-ids=0 row-size=8B cardinality=179999978268                         
>   |
> +-------------------------------------------------------------------------------+
> {code}
> {code}
> +--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+
> | Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem  | Est. Peak Mem | Detail                      |
> +--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+
> | 03:AGGREGATE | 1      | 1.28ms   | 1.28ms   | 1      | 1          | 532.00 
> KB | 10.00 MB      | FINALIZE                    |
> | 02:EXCHANGE  | 1      | 2.56s    | 2.56s    | 129    | 1          | 0 B     
>   | 0 B           | UNPARTITIONED               |
> | 01:AGGREGATE | 129    | 4.89ms   | 62.84ms  | 129    | 1          | 20.00 
> KB  | 10.00 MB      |                             |
> | 00:SCAN HDFS | 129    | 62.44ms  | 341.03ms | 28.98K | 180.00B    | 1.75 MB 
>   | 88.00 MB      | tpch_30000_parquet.lineitem |
> +--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to