[
https://issues.apache.org/jira/browse/IMPALA-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boglarka Egyed reassigned IMPALA-13470:
---------------------------------------
Assignee: Noémi Pap-Takács
> Stats loaded twice for Iceberg tables
> -------------------------------------
>
> Key: IMPALA-13470
> URL: https://issues.apache.org/jira/browse/IMPALA-13470
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Gabor Kaszab
> Assignee: Noémi Pap-Takács
> Priority: Major
> Labels: impala-iceberg
>
> When we load an Iceberg table, apparently the table stats are loaded twice
> from HMS.
> These are the HMS logs when we load an Iceberg table in Impala:
> {code:java}
> 2024-10-07 19:09:52,926 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
> [TThreadPoolServer WorkerProcess-190]: 194: get_table :
> tbl=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,926 INFO
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
> WorkerProcess-190]: ugi=impala/[email protected]
> ip=172.20.33.80 cmd=get_table : tbl=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,930 INFO
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer:
> [TThreadPoolServer WorkerProcess-190]: Starting translation for processor
> [email protected] on list 1
> 2024-10-07 19:09:52,930 INFO
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer:
> [TThreadPoolServer WorkerProcess-190]: Table
> calls,#bucket=0,isBucketed:false,tableType=EXTERNAL_TABLE,tableCapabilities=null
> 2024-10-07 19:09:52,931 INFO
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer:
> [TThreadPoolServer WorkerProcess-190]: Transformer return list of 1
> 2024-10-07 19:09:52,936 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
> [TThreadPoolServer WorkerProcess-2]: 7: get_all_write_event_info
> 2024-10-07 19:09:52,936 INFO
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
> WorkerProcess-2]: ugi=impala/[email protected]
> ip=172.20.33.54 cmd=get_all_write_event_info
> 2024-10-07 19:09:52,958 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
> [TThreadPoolServer WorkerProcess-8]: 9: get_config_value:
> name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__
> 2024-10-07 19:09:52,958 INFO
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
> WorkerProcess-8]: ugi=impala/[email protected]
> ip=172.20.33.80 cmd=get_config_value:
> name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__
>
> 2024-10-07 19:09:52,963 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
> [TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req:
> table=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,964 INFO
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
> WorkerProcess-8]: ugi=impala/[email protected]
> ip=172.20.33.80 cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls
>
> 2024-10-07 19:09:52,971 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
> [TThreadPoolServer WorkerProcess-8]: 9: get_primary_keys :
> tbl=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,971 INFO
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
> WorkerProcess-8]: ugi=impala/[email protected]
> ip=172.20.33.80 cmd=get_primary_keys : tbl=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,972 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
> [TThreadPoolServer WorkerProcess-8]: 9: get_foreign_keys : parentdb=null
> parenttbl=null foreigndb=yri_kf_csi foreigntbl=calls
> 2024-10-07 19:09:52,972 INFO
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
> WorkerProcess-8]: ugi=impala/[email protected]
> ip=172.20.33.80 cmd=get_foreign_keys : parentdb=null parenttbl=null
> foreigndb=yri_kf_csi foreigntbl=calls
> 2024-10-07 19:09:52,991 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
> [TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req:
> table=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,991 INFO
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
> WorkerProcess-8]: ugi=impala/[email protected]
> ip=172.20.33.80 cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls
>
> 2024-10-07 19:09:52,998 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
> [TThreadPoolServer WorkerProcess-8]: 9: alter_table: hive.yri_kf_csi.calls
> newtbl=calls
> 2024-10-07 19:09:52,998 INFO
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
> WorkerProcess-8]: ugi=impala/[email protected]
> ip=172.20.33.80 cmd=alter_table: hive.yri_kf_csi.calls newtbl=calls
> {code}
> get_table_statistics_req() seems to be called twice, once in HdfsTable.load()
> and once in IcebergTable.load()
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]