[
https://issues.apache.org/jira/browse/IMPALA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pooja Nilangekar reassigned IMPALA-7653:
----------------------------------------
Assignee: Paul Rogers (was: Pooja Nilangekar)
> Improve accuracy of compute incremental stats cardinality estimation
> --------------------------------------------------------------------
>
> Key: IMPALA-7653
> URL: https://issues.apache.org/jira/browse/IMPALA-7653
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Balazs Jeszenszky
> Assignee: Paul Rogers
> Priority: Major
> Labels: resource-management
>
> Currently, the operators of a compute [incremental] stats' subquery rely on
> combined selectivities - as usual - to estimate cardinality, e.g. during
> aggregation. For example, note the expected cardinality of the aggregation on
> this subquery:
> {code}
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
> Per-Host Resources: mem-estimate=305.20GB mem-reservation=136.00MB
> 01:AGGREGATE [STREAMING]
> | output: [...]
> | group by: col_a, col_b, col_c
> | mem-estimate=76.21GB mem-reservation=34.00MB spill-buffer=2.00MB
> | tuple-ids=1 row-size=104.83KB cardinality=693000
> |
> 00:SCAN HDFS [default.test, RANDOM]
> partitions=1/554 files=1 size=109.65MB
> stats-rows=1506374 extrapolated-rows=disabled
> table stats: rows=821958291 size=unavailable
> column stats: all
> mem-estimate=88.00MB mem-reservation=0B
> tuple-ids=0 row-size=2.06KB cardinality=1506374
> {code}
> This was generated as a result of compute incremental stats on a single
> partition, so the output of that aggregation is a single row. Due to the
> width of the intermediate rows, such overestimations lead to bloated memory
> estimates. Since the amount of partitions to be updated is known at
> plan-time, Impala could use that to set the aggregation's cardinality.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]