Neer393 commented on code in PR #6020: URL: https://github.com/apache/hive/pull/6020#discussion_r2266157898
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java: ########## @@ -79,35 +76,20 @@ public void run() { private void expireTables(String catalogName, String dbPattern, String tablePattern) { try (IMetaStoreClient msc = new HiveMetaStoreClient(conf)) { - // TODO: HIVE-28952 – modify TableFetcher to return HMS Table API objects directly, - // avoiding the need for subsequent msc.getTable calls to fetch each matched table individually - List<TableName> tables = IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, tablePattern).getTables(); - + int maxBatchSize = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX); + List<org.apache.hadoop.hive.metastore.api.Table> tables = + IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, tablePattern).getTables(maxBatchSize); LOG.debug("{} candidate tables found", tables.size()); - - for (TableName table : tables) { - try { - expireSnapshotsForTable(getIcebergTable(table, msc)); - } catch (Exception e) { - LOG.error("Exception while running iceberg expiry service on catalog/db/table: {}/{}/{}", - catalogName, dbPattern, tablePattern, e); - } + for (org.apache.hadoop.hive.metastore.api.Table table : tables) { + expireSnapshotsForTable(getIcebergTable(table)); } } catch (Exception e) { throw new RuntimeException("Error while getting tables from metastore", e); } } - private Table getIcebergTable(TableName tableName, IMetaStoreClient msc) { - return tableCache.get(tableName, key -> { - LOG.debug("Getting iceberg table from metastore as it's not present in table cache: {}", tableName); - GetTableRequest request = new GetTableRequest(tableName.getDb(), tableName.getTable()); - try { - return IcebergTableUtil.getTable(conf, msc.getTable(request)); - } catch (TException e) { - throw new RuntimeException(e); - } - }); + private Table getIcebergTable(org.apache.hadoop.hive.metastore.api.Table table) { + return tableCache.get(table, key -> IcebergTableUtil.getTable(conf, table)); Review Comment: @okumin even I agree that storing the whole table object as a key might be too much heavy for cache. Your proposed solution of using metadata location as the key is good and would handle cases what you mentioned The reason I did this was, we wanted to reduce the number of msc calls made here in IcebergHouseKeeperService.java. In the case where the cache mapping is between table name and Iceberg table object, we need to call ```getTableObjectByName()``` for each table name which increases the number of msc calls because from tableName we get hmsTable (making msc call) and from that we get IcebergTable. Now if we make the cache mapping of metadata location and IcebergTable First of all fetching all metadatalocations would be required. Secondly in case the metadatalocation key is not present in cache, we would need a function like ```IcebergTableUtil.getTableByMetadataLocation()``` which I don't know how we will create Iceberg Table object from metadata location . The point is it still won't reduce our msc calls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org