[
https://issues.apache.org/jira/browse/IMPALA-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Kaszab resolved IMPALA-11265.
-----------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
This ticket is used for the catalogd side of the optimizations. Moved out the
coordinatord side into this ticket:
https://issues.apache.org/jira/browse/IMPALA-13673
> Iceberg tables have a large memory footprint in catalog cache
> -------------------------------------------------------------
>
> Key: IMPALA-11265
> URL: https://issues.apache.org/jira/browse/IMPALA-11265
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Gabor Kaszab
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> During the investigation of IMPALA-11260, I found the cache item size of a
> (IcebergApiTableCacheKey, org.apache.iceberg.BaseTable) pair could be 30MB.
> For instance, here are the cache items of the iceberg table
> {{{}functional_parquet.iceberg_partitioned{}}}:
> {code:java}
> weigh=3792, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableMetaRefImpl
> weigh=14960, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$IcebergMetaCacheKey,
> valueClass=class org.apache.impala.thrift.TPartialTableInfo
> weigh=30546992, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$IcebergApiTableCacheKey,
> valueClass=class org.apache.iceberg.BaseTable
> weigh=496, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=512, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=472, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionListCacheKey,
> valueClass=class java.util.ArrayList
> weigh=10328, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl{code}
> Note that this table just have 20 rows. The total memory footprint size is
> 30MB.
> For a normal partitioned partquet table, the memory footprint is not that
> large. For instance, here are the cache items for
> {{{}functional_parquet.alltypes{}}}:
> {code:java}
> weigh=4216, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableMetaRefImpl
> weigh=480, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=472, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=480, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=352, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=352, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=4248, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionListCacheKey,
> valueClass=class java.util.ArrayList
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
> valueClass=class
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl{code}
> The total size is around 45KB.
> It worths double checking whether we need the whole
> org.apache.iceberg.BaseTable object. Maybe we can just extract what Impala
> needs into a custom value class.
> CC [~boroknagyz]
> *Update:*
> I'm not sure how that measurement was performed, but I don't think that the
> difference of table types comes from the BaseTable object, but from the way
> we store them. Most importantly how we store file descriptors. I've made
> measurement on the catalogd and also on the coordinator in local-catalog
> mode. You can find the numbers in this doc:
> [https://docs.google.com/spreadsheets/d/1bTwH6wwy6CVjy1nNw-FGbec8Fi-KM9meFNC8-wtRncI/edit?usp=sharing]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)