[ 
https://issues.apache.org/jira/browse/IMPALA-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885130#comment-17885130
 ] 

ASF subversion and git services commented on IMPALA-11265:
----------------------------------------------------------

Commit 4680cfd341e5245088cfce1d6d8507e7314182f1 in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4680cfd34 ]

IMPALA-11265: Part1: Clear GroupContentFiles once used

GroupContentFiles stores the file descriptors in Iceberg's format and is
used for creating file descriptors in Impala's format. Once this
creation is done, we no longer have to keep these Iceberg ContentFiles.
Dropping these could significantly reduce the memory footprint of an
Iceberg table.

Measurements:
I have a test table that has 110k files. The measurements showed that
cleaning the GroupedContentFiles could reduce the memory size of this
particular table from 140MB to 80MB.

Change-Id: I1efdd2a46c9675f7461535259e5892ed213a6b21
Reviewed-on: http://gerrit.cloudera.org:8080/21847
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Iceberg tables have a large memory footprint in catalog cache
> -------------------------------------------------------------
>
>                 Key: IMPALA-11265
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11265
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: impala-iceberg
>
> During the investigation of IMPALA-11260, I found the cache item size of a 
> (IcebergApiTableCacheKey, org.apache.iceberg.BaseTable) pair could be 30MB.
> For instance, here are the cache items of the iceberg table 
> {{{}functional_parquet.iceberg_partitioned{}}}:
> {code:java}
> weigh=3792, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableMetaRefImpl
> weigh=14960, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$IcebergMetaCacheKey, 
> valueClass=class org.apache.impala.thrift.TPartialTableInfo
> weigh=30546992, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$IcebergApiTableCacheKey, 
> valueClass=class org.apache.iceberg.BaseTable
> weigh=496, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=512, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=472, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionListCacheKey, 
> valueClass=class java.util.ArrayList
> weigh=10328, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl{code}
> Note that this table just have 20 rows. The total memory footprint size is 
> 30MB.
> For a normal partitioned partquet table, the memory footprint is not that 
> large. For instance, here are the cache items for 
> {{{}functional_parquet.alltypes{}}}:
> {code:java}
> weigh=4216, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableMetaRefImpl
> weigh=480, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=472, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=480, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=352, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=352, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=4248, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionListCacheKey, 
> valueClass=class java.util.ArrayList
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl{code}
> The total size is around 45KB.
> It worths double checking whether we need the whole 
> org.apache.iceberg.BaseTable object. Maybe we can just extract what Impala 
> needs into a custom value class.
> CC [~boroknagyz] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to