[ 
https://issues.apache.org/jira/browse/IMPALA-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928684#comment-17928684
 ] 

ASF subversion and git services commented on IMPALA-11265:
----------------------------------------------------------

Commit 37e409059437279c960eba71b3bce69ffbd65f2e in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=37e409059 ]

IMPALA-13737: Directly load file metadata via IcebergFileMetadataLoader

Currently we let HdfsTable to drive file metadata loading of Iceberg
tables. To have better control over file loading, IcebergTable should
use IcebergFileMetadataLoader directly. The underlying HdfsTable can be
empty, which will make it easier to remove this dependency completely.
Also, it solves the de-duplication of file descriptors in Local Catalog
mode.

This patch also clarifies the responsibilities of
IcebergFileMetadataLoader and IcebergContentFileStore. The former
is in charge of loading the file descriptors and decorating them
with Iceberg metadata. The latter is only responsible for grouping
and storing them in an efficient manner.

This patch removes the dependency of IcebergContentFileStore on
FeIcebergTable which will make the REST Catalog implementation
cleaner.

Measurements
(Thanks to Gabor Kaszab for the numbers)
As mentioned above, this patch de-duplicates the file descriptors
in local catalog mode. I.e. it greatly reduces the memory footprint
(IMPALA-11265) in the Coordinator when local catalog is being used.

The measured table had 110k files, 16400 partitions, 1000 manifests,
1000 snapshots. The memory footprint:
Before this patch: 107MB
After this patch:   74MB

Testing:
 * no new functionalities added, existing tests should work

Change-Id: Iaf7e23ec21b65036b47edadcb4cbe4b64be3baee
Reviewed-on: http://gerrit.cloudera.org:8080/22458
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Iceberg tables have a large memory footprint in catalog cache
> -------------------------------------------------------------
>
>                 Key: IMPALA-11265
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11265
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.5.0
>
>
> During the investigation of IMPALA-11260, I found the cache item size of a 
> (IcebergApiTableCacheKey, org.apache.iceberg.BaseTable) pair could be 30MB.
> For instance, here are the cache items of the iceberg table 
> {{{}functional_parquet.iceberg_partitioned{}}}:
> {code:java}
> weigh=3792, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableMetaRefImpl
> weigh=14960, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$IcebergMetaCacheKey, 
> valueClass=class org.apache.impala.thrift.TPartialTableInfo
> weigh=30546992, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$IcebergApiTableCacheKey, 
> valueClass=class org.apache.iceberg.BaseTable
> weigh=496, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=512, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=472, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionListCacheKey, 
> valueClass=class java.util.ArrayList
> weigh=10328, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl{code}
> Note that this table just have 20 rows. The total memory footprint size is 
> 30MB.
> For a normal partitioned partquet table, the memory footprint is not that 
> large. For instance, here are the cache items for 
> {{{}functional_parquet.alltypes{}}}:
> {code:java}
> weigh=4216, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$TableMetaRefImpl
> weigh=480, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=472, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=480, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=488, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=496, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=352, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=352, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$ColStatsCacheKey, 
> valueClass=class org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj
> weigh=4248, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionListCacheKey, 
> valueClass=class java.util.ArrayList
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1296, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl
> weigh=1288, keyClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionCacheKey, 
> valueClass=class 
> org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl{code}
> The total size is around 45KB.
> It worths double checking whether we need the whole 
> org.apache.iceberg.BaseTable object. Maybe we can just extract what Impala 
> needs into a custom value class.
> CC [~boroknagyz] 
> *Update:*
> I'm not sure how that measurement was performed, but I don't think that the 
> difference of table types comes from the BaseTable object, but from the way 
> we store them. Most importantly how we store file descriptors. I've made 
> measurement on the catalogd and also on the coordinator in local-catalog 
> mode. You can find the numbers in this doc:
> [https://docs.google.com/spreadsheets/d/1bTwH6wwy6CVjy1nNw-FGbec8Fi-KM9meFNC8-wtRncI/edit?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to