Gabor Kaszab created IMPALA-13470:
-------------------------------------

             Summary: Stats loaded twice for Iceberg tables
                 Key: IMPALA-13470
                 URL: https://issues.apache.org/jira/browse/IMPALA-13470
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Gabor Kaszab


When we load an Iceberg table, apparently the table stats are loaded twice from 
HMS.
These are the HMS logs when we load an Iceberg table in Impala:
{code:java}
2024-10-07 19:09:52,926 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[TThreadPoolServer WorkerProcess-190]: 194: get_table : 
tbl=hive.yri_kf_csi.calls
2024-10-07 19:09:52,926 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
WorkerProcess-190]: ugi=impala/[email protected]    
ip=172.20.33.80    cmd=get_table : tbl=hive.yri_kf_csi.calls    
2024-10-07 19:09:52,930 INFO  
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
[TThreadPoolServer WorkerProcess-190]: Starting translation for processor 
[email protected] on list 1
2024-10-07 19:09:52,930 INFO  
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
[TThreadPoolServer WorkerProcess-190]: Table 
calls,#bucket=0,isBucketed:false,tableType=EXTERNAL_TABLE,tableCapabilities=null
2024-10-07 19:09:52,931 INFO  
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
[TThreadPoolServer WorkerProcess-190]: Transformer return list of 1
2024-10-07 19:09:52,936 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[TThreadPoolServer WorkerProcess-2]: 7: get_all_write_event_info
2024-10-07 19:09:52,936 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
WorkerProcess-2]: ugi=impala/[email protected]    
ip=172.20.33.54    cmd=get_all_write_event_info    
2024-10-07 19:09:52,958 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[TThreadPoolServer WorkerProcess-8]: 9: get_config_value: 
name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__
2024-10-07 19:09:52,958 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
WorkerProcess-8]: ugi=impala/[email protected]    
ip=172.20.33.80    cmd=get_config_value: name=hive.exec.default.partition.name 
defaultValue=__HIVE_DEFAULT_PARTITION__    
2024-10-07 19:09:52,963 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req: 
table=hive.yri_kf_csi.calls
2024-10-07 19:09:52,964 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
WorkerProcess-8]: ugi=impala/[email protected]    
ip=172.20.33.80    cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls    
2024-10-07 19:09:52,971 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[TThreadPoolServer WorkerProcess-8]: 9: get_primary_keys : 
tbl=hive.yri_kf_csi.calls
2024-10-07 19:09:52,971 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
WorkerProcess-8]: ugi=impala/[email protected]    
ip=172.20.33.80    cmd=get_primary_keys : tbl=hive.yri_kf_csi.calls    
2024-10-07 19:09:52,972 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[TThreadPoolServer WorkerProcess-8]: 9: get_foreign_keys : parentdb=null 
parenttbl=null foreigndb=yri_kf_csi foreigntbl=calls
2024-10-07 19:09:52,972 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
WorkerProcess-8]: ugi=impala/[email protected]    
ip=172.20.33.80    cmd=get_foreign_keys : parentdb=null parenttbl=null 
foreigndb=yri_kf_csi foreigntbl=calls    
2024-10-07 19:09:52,991 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req: 
table=hive.yri_kf_csi.calls
2024-10-07 19:09:52,991 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
WorkerProcess-8]: ugi=impala/[email protected]    
ip=172.20.33.80    cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls    
2024-10-07 19:09:52,998 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[TThreadPoolServer WorkerProcess-8]: 9: alter_table: hive.yri_kf_csi.calls 
newtbl=calls
2024-10-07 19:09:52,998 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
WorkerProcess-8]: ugi=impala/[email protected]    
ip=172.20.33.80    cmd=alter_table: hive.yri_kf_csi.calls newtbl=calls    
 {code}
get_table_statistics_req() seems to be called twice, once in HdfsTable.load() 
and once in IcebergTable.load()

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to