Neer393 commented on code in PR #6020:
URL: https://github.com/apache/hive/pull/6020#discussion_r2266212049


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java:
##########
@@ -79,35 +76,20 @@ public void run() {
 
   private void expireTables(String catalogName, String dbPattern, String 
tablePattern) {
     try (IMetaStoreClient msc = new HiveMetaStoreClient(conf)) {
-      // TODO: HIVE-28952 – modify TableFetcher to return HMS Table API 
objects directly,
-      // avoiding the need for subsequent msc.getTable calls to fetch each 
matched table individually
-      List<TableName> tables = IcebergTableUtil.getTableFetcher(msc, 
catalogName, dbPattern, tablePattern).getTables();
-
+      int maxBatchSize = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX);
+      List<org.apache.hadoop.hive.metastore.api.Table> tables =
+          IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, 
tablePattern).getTables(maxBatchSize);
       LOG.debug("{} candidate tables found", tables.size());
-
-      for (TableName table : tables) {
-        try {
-          expireSnapshotsForTable(getIcebergTable(table, msc));
-        } catch (Exception e) {
-          LOG.error("Exception while running iceberg expiry service on 
catalog/db/table: {}/{}/{}",
-              catalogName, dbPattern, tablePattern, e);
-        }
+      for (org.apache.hadoop.hive.metastore.api.Table table : tables) {
+        expireSnapshotsForTable(getIcebergTable(table));
       }
     } catch (Exception e) {
       throw new RuntimeException("Error while getting tables from metastore", 
e);
     }
   }
 
-  private Table getIcebergTable(TableName tableName, IMetaStoreClient msc) {
-    return tableCache.get(tableName, key -> {
-      LOG.debug("Getting iceberg table from metastore as it's not present in 
table cache: {}", tableName);
-      GetTableRequest request = new GetTableRequest(tableName.getDb(), 
tableName.getTable());
-      try {
-        return IcebergTableUtil.getTable(conf, msc.getTable(request));
-      } catch (TException e) {
-        throw new RuntimeException(e);
-      }
-    });
+  private Table getIcebergTable(org.apache.hadoop.hive.metastore.api.Table 
table) {
+    return tableCache.get(table, key -> IcebergTableUtil.getTable(conf, 
table));

Review Comment:
   > I'd also suspect the original `TableName` might not be sufficient because 
the table object might have been updated, or it might not be an Iceberg table 
already
   
   I have a solution to this. What if we keep the current mapping same but 
along with this, we add a new field which indicates the timestamp this was 
added to cache. 
   So if we want to know if the Iceberg Table in cache is stale or not, what we 
would do is compare the timestamps i.e the lastDDLTimestamp of the table and 
the cache timestamp. If the lastDDLTimestamp(saved as 
```transient_lastDdlTime``` in the table) is greater, we would not return the 
cache object but instead force fetch the new IcebergTable.
   
   How does this sound @okumin ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to