Neer393 commented on code in PR #6020:
URL: https://github.com/apache/hive/pull/6020#discussion_r2266157898


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java:
##########
@@ -79,35 +76,20 @@ public void run() {
 
   private void expireTables(String catalogName, String dbPattern, String 
tablePattern) {
     try (IMetaStoreClient msc = new HiveMetaStoreClient(conf)) {
-      // TODO: HIVE-28952 – modify TableFetcher to return HMS Table API 
objects directly,
-      // avoiding the need for subsequent msc.getTable calls to fetch each 
matched table individually
-      List<TableName> tables = IcebergTableUtil.getTableFetcher(msc, 
catalogName, dbPattern, tablePattern).getTables();
-
+      int maxBatchSize = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX);
+      List<org.apache.hadoop.hive.metastore.api.Table> tables =
+          IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, 
tablePattern).getTables(maxBatchSize);
       LOG.debug("{} candidate tables found", tables.size());
-
-      for (TableName table : tables) {
-        try {
-          expireSnapshotsForTable(getIcebergTable(table, msc));
-        } catch (Exception e) {
-          LOG.error("Exception while running iceberg expiry service on 
catalog/db/table: {}/{}/{}",
-              catalogName, dbPattern, tablePattern, e);
-        }
+      for (org.apache.hadoop.hive.metastore.api.Table table : tables) {
+        expireSnapshotsForTable(getIcebergTable(table));
       }
     } catch (Exception e) {
       throw new RuntimeException("Error while getting tables from metastore", 
e);
     }
   }
 
-  private Table getIcebergTable(TableName tableName, IMetaStoreClient msc) {
-    return tableCache.get(tableName, key -> {
-      LOG.debug("Getting iceberg table from metastore as it's not present in 
table cache: {}", tableName);
-      GetTableRequest request = new GetTableRequest(tableName.getDb(), 
tableName.getTable());
-      try {
-        return IcebergTableUtil.getTable(conf, msc.getTable(request));
-      } catch (TException e) {
-        throw new RuntimeException(e);
-      }
-    });
+  private Table getIcebergTable(org.apache.hadoop.hive.metastore.api.Table 
table) {
+    return tableCache.get(table, key -> IcebergTableUtil.getTable(conf, 
table));

Review Comment:
   @okumin even I agree that storing the whole table object as a key might be 
too much heavy for cache. 
   Your proposed solution of using metadata location as the key is good and 
would handle cases what you mentioned
   
   The reason I did this was, we wanted to reduce the number of msc calls made 
here in IcebergHouseKeeperService.java. In the case where the cache mapping is 
between table name and Iceberg table object, we need to call 
```getTableObjectByName()``` for each table name which increases the number of 
msc calls because from tableName we get hmsTable (making msc call) and from 
that we get IcebergTable.
   
   Now if we make the cache mapping of metadata location and IcebergTable
   First of all fetching all metadatalocations would be required. 
   Secondly in case the metadatalocation key is not present in cache, we would 
need a function like ```IcebergTableUtil.getTableByMetadataLocation()``` which 
I don't know how we will create Iceberg Table object from metadata location .
   
   The point is it still won't reduce our msc calls.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to