Gabor Kaszab created IMPALA-13470:
-------------------------------------
Summary: Stats loaded twice for Iceberg tables
Key: IMPALA-13470
URL: https://issues.apache.org/jira/browse/IMPALA-13470
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Gabor Kaszab
When we load an Iceberg table, apparently the table stats are loaded twice from
HMS.
These are the HMS logs when we load an Iceberg table in Impala:
{code:java}
2024-10-07 19:09:52,926 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
[TThreadPoolServer WorkerProcess-190]: 194: get_table :
tbl=hive.yri_kf_csi.calls
2024-10-07 19:09:52,926 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
WorkerProcess-190]: ugi=impala/[email protected]
ip=172.20.33.80 cmd=get_table : tbl=hive.yri_kf_csi.calls
2024-10-07 19:09:52,930 INFO
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer:
[TThreadPoolServer WorkerProcess-190]: Starting translation for processor
[email protected] on list 1
2024-10-07 19:09:52,930 INFO
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer:
[TThreadPoolServer WorkerProcess-190]: Table
calls,#bucket=0,isBucketed:false,tableType=EXTERNAL_TABLE,tableCapabilities=null
2024-10-07 19:09:52,931 INFO
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer:
[TThreadPoolServer WorkerProcess-190]: Transformer return list of 1
2024-10-07 19:09:52,936 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
[TThreadPoolServer WorkerProcess-2]: 7: get_all_write_event_info
2024-10-07 19:09:52,936 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
WorkerProcess-2]: ugi=impala/[email protected]
ip=172.20.33.54 cmd=get_all_write_event_info
2024-10-07 19:09:52,958 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
[TThreadPoolServer WorkerProcess-8]: 9: get_config_value:
name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__
2024-10-07 19:09:52,958 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
WorkerProcess-8]: ugi=impala/[email protected]
ip=172.20.33.80 cmd=get_config_value: name=hive.exec.default.partition.name
defaultValue=__HIVE_DEFAULT_PARTITION__
2024-10-07 19:09:52,963 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
[TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req:
table=hive.yri_kf_csi.calls
2024-10-07 19:09:52,964 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
WorkerProcess-8]: ugi=impala/[email protected]
ip=172.20.33.80 cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls
2024-10-07 19:09:52,971 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
[TThreadPoolServer WorkerProcess-8]: 9: get_primary_keys :
tbl=hive.yri_kf_csi.calls
2024-10-07 19:09:52,971 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
WorkerProcess-8]: ugi=impala/[email protected]
ip=172.20.33.80 cmd=get_primary_keys : tbl=hive.yri_kf_csi.calls
2024-10-07 19:09:52,972 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
[TThreadPoolServer WorkerProcess-8]: 9: get_foreign_keys : parentdb=null
parenttbl=null foreigndb=yri_kf_csi foreigntbl=calls
2024-10-07 19:09:52,972 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
WorkerProcess-8]: ugi=impala/[email protected]
ip=172.20.33.80 cmd=get_foreign_keys : parentdb=null parenttbl=null
foreigndb=yri_kf_csi foreigntbl=calls
2024-10-07 19:09:52,991 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
[TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req:
table=hive.yri_kf_csi.calls
2024-10-07 19:09:52,991 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
WorkerProcess-8]: ugi=impala/[email protected]
ip=172.20.33.80 cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls
2024-10-07 19:09:52,998 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
[TThreadPoolServer WorkerProcess-8]: 9: alter_table: hive.yri_kf_csi.calls
newtbl=calls
2024-10-07 19:09:52,998 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer
WorkerProcess-8]: ugi=impala/[email protected]
ip=172.20.33.80 cmd=alter_table: hive.yri_kf_csi.calls newtbl=calls
{code}
get_table_statistics_req() seems to be called twice, once in HdfsTable.load()
and once in IcebergTable.load()
--
This message was sent by Atlassian Jira
(v8.20.10#820010)