okumin commented on code in PR #6020:
URL: https://github.com/apache/hive/pull/6020#discussion_r2268766025


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java:
##########
@@ -79,35 +76,20 @@ public void run() {
 
   private void expireTables(String catalogName, String dbPattern, String 
tablePattern) {
     try (IMetaStoreClient msc = new HiveMetaStoreClient(conf)) {
-      // TODO: HIVE-28952 – modify TableFetcher to return HMS Table API 
objects directly,
-      // avoiding the need for subsequent msc.getTable calls to fetch each 
matched table individually
-      List<TableName> tables = IcebergTableUtil.getTableFetcher(msc, 
catalogName, dbPattern, tablePattern).getTables();
-
+      int maxBatchSize = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX);
+      List<org.apache.hadoop.hive.metastore.api.Table> tables =
+          IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, 
tablePattern).getTables(maxBatchSize);
       LOG.debug("{} candidate tables found", tables.size());
-
-      for (TableName table : tables) {
-        try {
-          expireSnapshotsForTable(getIcebergTable(table, msc));
-        } catch (Exception e) {
-          LOG.error("Exception while running iceberg expiry service on 
catalog/db/table: {}/{}/{}",
-              catalogName, dbPattern, tablePattern, e);
-        }
+      for (org.apache.hadoop.hive.metastore.api.Table table : tables) {
+        expireSnapshotsForTable(getIcebergTable(table));
       }
     } catch (Exception e) {
       throw new RuntimeException("Error while getting tables from metastore", 
e);
     }
   }
 
-  private Table getIcebergTable(TableName tableName, IMetaStoreClient msc) {
-    return tableCache.get(tableName, key -> {
-      LOG.debug("Getting iceberg table from metastore as it's not present in 
table cache: {}", tableName);
-      GetTableRequest request = new GetTableRequest(tableName.getDb(), 
tableName.getTable());
-      try {
-        return IcebergTableUtil.getTable(conf, msc.getTable(request));
-      } catch (TException e) {
-        throw new RuntimeException(e);
-      }
-    });
+  private Table getIcebergTable(org.apache.hadoop.hive.metastore.api.Table 
table) {
+    return tableCache.get(table, key -> IcebergTableUtil.getTable(conf, 
table));

Review Comment:
   You're right. I didn't notice that the `table` object is needed anyway on a 
cache miss; I assumed we could create `org.apache.iceberg.Table` from a 
metadata.json.
   
   ```
   tableCache.get(table, key -> IcebergTableUtil.getTable(conf, table))
   ```
   
   > we don't need advanced caching here. we use iceberg table ref to call the 
expire snapshots. cache TTL is 1 day, do you some issue with that?
   
   I assume we should expire snapshots on as up-to-date 
`org.apache.iceberg.Table` as possible to include all metadata.json and avoid 
conflicts(as ExpireSnapshot updates the latest metadata, I assume an operation 
against an obsolete Iceberg table might cause one additional retry); It means 
the current one seems to be better than the previous one.
   
https://github.com/apache/iceberg/blob/apache-iceberg-1.9.2/api/src/main/java/org/apache/iceberg/ExpireSnapshots.java#L31-L32



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to