[
https://issues.apache.org/jira/browse/IMPALA-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riza Suminto resolved IMPALA-5851.
----------------------------------
Fix Version/s: Impala 4.3.0
Resolution: Duplicate
Resolving this as a duplicate of IMPALA-12395.
> Estimate number of rows for sum_init_zero scans should be number of files
> not table cardinality
> ------------------------------------------------------------------------------------------------
>
> Key: IMPALA-5851
> URL: https://issues.apache.org/jira/browse/IMPALA-5851
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Mostafa Mokhtar
> Priority: Minor
> Fix For: Impala 4.3.0
>
>
> IMPALA-5036 introduced an optimization to use the data stored in the Parquet
> RowGroup.num_rows field for count(*) queries.
> The estimate cardinality for the scan is the number of rows in the base table
> opposed to number of files or row groups.
> {code}
> +-------------------------------------------------------------------------------+
> | Explain String
> |
> +-------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=0B
> |
> | Per-Host Resource Estimates: Memory=108.00MB
> |
> |
> |
> | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |
> | | Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B
> |
> | PLAN-ROOT SINK
> |
> | | mem-estimate=0B mem-reservation=0B
> |
> | |
> |
> | 03:AGGREGATE [FINALIZE]
> |
> | | output: count:merge(*)
> |
> | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB
> |
> | | tuple-ids=1 row-size=8B cardinality=1
> |
> | |
> |
> | 02:EXCHANGE [UNPARTITIONED]
> |
> | | mem-estimate=0B mem-reservation=0B
> |
> | | tuple-ids=1 row-size=8B cardinality=1
> |
> | |
> |
> | F00:PLAN FRAGMENT [RANDOM] hosts=130 instances=130
> |
> | Per-Host Resources: mem-estimate=98.00MB mem-reservation=0B
> |
> | 01:AGGREGATE
> |
> | | output: sum_init_zero(tpch_30000_parquet.lineitem.parquet-stats:
> num_rows) |
> | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB
> |
> | | tuple-ids=1 row-size=8B cardinality=1
> |
> | |
> |
> | 00:SCAN HDFS [tpch_30000_parquet.lineitem, RANDOM]
> |
> | partitions=2526/2526 files=28976 size=6.89TB
> |
> | stats-rows=179999978268 extrapolated-rows=disabled
> |
> | table stats: rows=179999978268 size=unavailable
> |
> | column stats: all
> |
> | mem-estimate=88.00MB mem-reservation=0B
> |
> | tuple-ids=0 row-size=8B cardinality=179999978268
> |
> +-------------------------------------------------------------------------------+
> {code}
> {code}
> +--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak
> Mem | Est. Peak Mem | Detail |
> +--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+
> | 03:AGGREGATE | 1 | 1.28ms | 1.28ms | 1 | 1 | 532.00
> KB | 10.00 MB | FINALIZE |
> | 02:EXCHANGE | 1 | 2.56s | 2.56s | 129 | 1 | 0 B
> | 0 B | UNPARTITIONED |
> | 01:AGGREGATE | 129 | 4.89ms | 62.84ms | 129 | 1 | 20.00
> KB | 10.00 MB | |
> | 00:SCAN HDFS | 129 | 62.44ms | 341.03ms | 28.98K | 180.00B | 1.75 MB
> | 88.00 MB | tpch_30000_parquet.lineitem |
> +--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)