[ 
https://issues.apache.org/jira/browse/IMPALA-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-13470:
---------------------------------------

    Assignee: Noémi Pap-Takács

> Stats loaded twice for Iceberg tables
> -------------------------------------
>
>                 Key: IMPALA-13470
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13470
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Gabor Kaszab
>            Assignee: Noémi Pap-Takács
>            Priority: Major
>              Labels: impala-iceberg
>
> When we load an Iceberg table, apparently the table stats are loaded twice 
> from HMS.
> These are the HMS logs when we load an Iceberg table in Impala:
> {code:java}
> 2024-10-07 19:09:52,926 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [TThreadPoolServer WorkerProcess-190]: 194: get_table : 
> tbl=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,926 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
> WorkerProcess-190]: ugi=impala/[email protected]    
> ip=172.20.33.80    cmd=get_table : tbl=hive.yri_kf_csi.calls    
> 2024-10-07 19:09:52,930 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [TThreadPoolServer WorkerProcess-190]: Starting translation for processor 
> [email protected] on list 1
> 2024-10-07 19:09:52,930 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [TThreadPoolServer WorkerProcess-190]: Table 
> calls,#bucket=0,isBucketed:false,tableType=EXTERNAL_TABLE,tableCapabilities=null
> 2024-10-07 19:09:52,931 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [TThreadPoolServer WorkerProcess-190]: Transformer return list of 1
> 2024-10-07 19:09:52,936 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [TThreadPoolServer WorkerProcess-2]: 7: get_all_write_event_info
> 2024-10-07 19:09:52,936 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
> WorkerProcess-2]: ugi=impala/[email protected]    
> ip=172.20.33.54    cmd=get_all_write_event_info    
> 2024-10-07 19:09:52,958 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [TThreadPoolServer WorkerProcess-8]: 9: get_config_value: 
> name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__
> 2024-10-07 19:09:52,958 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
> WorkerProcess-8]: ugi=impala/[email protected]    
> ip=172.20.33.80    cmd=get_config_value: 
> name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__ 
>    
> 2024-10-07 19:09:52,963 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req: 
> table=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,964 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
> WorkerProcess-8]: ugi=impala/[email protected]    
> ip=172.20.33.80    cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls  
>   
> 2024-10-07 19:09:52,971 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [TThreadPoolServer WorkerProcess-8]: 9: get_primary_keys : 
> tbl=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,971 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
> WorkerProcess-8]: ugi=impala/[email protected]    
> ip=172.20.33.80    cmd=get_primary_keys : tbl=hive.yri_kf_csi.calls    
> 2024-10-07 19:09:52,972 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [TThreadPoolServer WorkerProcess-8]: 9: get_foreign_keys : parentdb=null 
> parenttbl=null foreigndb=yri_kf_csi foreigntbl=calls
> 2024-10-07 19:09:52,972 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
> WorkerProcess-8]: ugi=impala/[email protected]    
> ip=172.20.33.80    cmd=get_foreign_keys : parentdb=null parenttbl=null 
> foreigndb=yri_kf_csi foreigntbl=calls    
> 2024-10-07 19:09:52,991 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req: 
> table=hive.yri_kf_csi.calls
> 2024-10-07 19:09:52,991 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
> WorkerProcess-8]: ugi=impala/[email protected]    
> ip=172.20.33.80    cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls  
>   
> 2024-10-07 19:09:52,998 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
> [TThreadPoolServer WorkerProcess-8]: 9: alter_table: hive.yri_kf_csi.calls 
> newtbl=calls
> 2024-10-07 19:09:52,998 INFO  
> org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer 
> WorkerProcess-8]: ugi=impala/[email protected]    
> ip=172.20.33.80    cmd=alter_table: hive.yri_kf_csi.calls newtbl=calls    
>  {code}
> get_table_statistics_req() seems to be called twice, once in HdfsTable.load() 
> and once in IcebergTable.load()
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to