Quanlong Huang created IMPALA-11265:
---------------------------------------
Summary: Iceberg tables have a large memory footprint in catalog
cache
Key: IMPALA-11265
URL: https://issues.apache.org/jira/browse/IMPALA-11265
Project: IMPALA
Issue Type: Improvement
Components: Catalog
Reporter: Quanlong Huang
During the investigation of IMPALA-11260, I found the cache item size of a
(IcebergApiTableCacheKey, org.apache.iceberg.BaseTable) pair could be 30MB.
For instance, here are the cache items of the iceberg table
{{{}functional_parquet.iceberg_partitioned{}}}:
{code:java}
weigh=3792, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$TableCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$TableMetaRefImpl
weigh=14960, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$IcebergMetaCacheKey,
valueClass=class org.apache.impala.thrift.TPartialTableInfo
weigh=30546992, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$IcebergApiTableCacheKey,
valueClass=class org.apache.iceberg.BaseTable
weigh=496, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=496, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=496, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=512, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=472, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionListCacheKey,
valueClass=class java.util.ArrayList
weigh=10328, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl{code}
Note that this table just have 20 rows. The total memory footprint size is 30MB.
For a normal partitioned partquet table, the memory footprint is not that
large. For instance, here are the cache items for
{{{}functional_parquet.alltypes{}}}:
{code:java}
weigh=4216, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$TableCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$TableMetaRefImpl
weigh=480, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=472, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=488, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=488, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=480, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=488, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=488, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=488, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=488, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=488, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=496, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=352, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=352, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey,
valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
weigh=4248, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionListCacheKey,
valueClass=class java.util.ArrayList
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1296, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
weigh=1288, keyClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey,
valueClass=class
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl{code}
The total size is around 45KB.
It worths double checking whether we need the whole
org.apache.iceberg.BaseTable object. Maybe we can just extract what Impala
needs into a custom value class.
CC [~boroknagyz]
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]