[
https://issues.apache.org/jira/browse/IMPALA-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Fehr updated IMPALA-13794:
--------------------------------
Description:
Currently we use HdfsTable's memory usage estimates for Iceberg tables, even
though they have different characteristics.
We should come up with a calculation that is better suited for Iceberg tables.
The current estimate doesn't take some things into account, e.g.
* IcebergFileDescriptor sizes are larger than plain FileDescriptor sizes,
especially when the table is partitioned
* IcebergContentFileStore's data structures overhead
* V3: Deletion Vectors stored in IcebergContentFileStore.dataFileToDV_
* the internal Iceberg BaseTable object
Also need to cross-check the estimation with the Jamm weigher (that we use in
our caches).
was:
Currently we use HdfsTable's memory usage estimates for Iceberg tables, even
though they have different characteristics.
We should come up with a calculation that is better suited for Iceberg tables.
> Calculate memory usage estimates more precisely for Iceberg tables
> ------------------------------------------------------------------
>
> Key: IMPALA-13794
> URL: https://issues.apache.org/jira/browse/IMPALA-13794
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Zoltán Borók-Nagy
> Assignee: Jason Fehr
> Priority: Major
> Labels: impala-iceberg, ramp-up
>
> Currently we use HdfsTable's memory usage estimates for Iceberg tables, even
> though they have different characteristics.
> We should come up with a calculation that is better suited for Iceberg tables.
> The current estimate doesn't take some things into account, e.g.
> * IcebergFileDescriptor sizes are larger than plain FileDescriptor sizes,
> especially when the table is partitioned
> * IcebergContentFileStore's data structures overhead
> * V3: Deletion Vectors stored in IcebergContentFileStore.dataFileToDV_
> * the internal Iceberg BaseTable object
> Also need to cross-check the estimation with the Jamm weigher (that we use in
> our caches).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]