[
https://issues.apache.org/jira/browse/IMPALA-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manish Maheshwari reassigned IMPALA-11969:
------------------------------------------
Assignee: Sai Hemanth Gantasala
> Move incremental statistics out of the partition params to reduce partition's
> memory footprint
> ----------------------------------------------------------------------------------------------
>
> Key: IMPALA-11969
> URL: https://issues.apache.org/jira/browse/IMPALA-11969
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog, Frontend
> Affects Versions: Impala 4.2.0
> Reporter: Aman Sinha
> Assignee: Sai Hemanth Gantasala
> Priority: Major
>
> Impala stores incremental stats ([1] and [2]) within the partition params of
> each partition object. The total bytes consumed by the incremental stats is
> estimated as:
> {noformat}
> 200 bytes per column * num_columns * num_partitions bytes
> {noformat}
> This means that for wide tables the partition object can get bloated and for
> a few thousand partitions, this adds up. It also affects Hive because the
> same partition objects are fetched by Hive and can lead to memory pressure
> even though Hive does not need those incremental stats.
> The intent of the partition parameters in a Partition object was primarily to
> keep certain properties or small objects. We should move incremental stats
> out of the partition params and store them separately in HMS. There is a
> PART_COL_STATS in HMS that could potentially be used (with some redesign of
> the schema) or a separate table could be added.
> Additional notes:
> * Only the latest version of incremental stats is stored. They are stored in
> partition parameters using keys like "impala_intermediate_stats_num_chunks"
> and "impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1",
> "impala_intermediate_stats_chunk2", etc. The chunks are used to cap each
> value size to be < 4000. [3]
> * Impala compresses the incr stats of all columns into a byte array for each
> partition. It'd be nice if HMS can compress them as well to save space.
> [1]
> https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L261
> [2]
> https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L232
> [3]
> https://github.com/apache/impala/blob/419aa2e30db326f02e9b4ec563ef7864e82df86e/fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java#L182
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]