okumin commented on code in PR #6020: URL: https://github.com/apache/hive/pull/6020#discussion_r2268766025
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java: ########## @@ -79,35 +76,20 @@ public void run() { private void expireTables(String catalogName, String dbPattern, String tablePattern) { try (IMetaStoreClient msc = new HiveMetaStoreClient(conf)) { - // TODO: HIVE-28952 – modify TableFetcher to return HMS Table API objects directly, - // avoiding the need for subsequent msc.getTable calls to fetch each matched table individually - List<TableName> tables = IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, tablePattern).getTables(); - + int maxBatchSize = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX); + List<org.apache.hadoop.hive.metastore.api.Table> tables = + IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, tablePattern).getTables(maxBatchSize); LOG.debug("{} candidate tables found", tables.size()); - - for (TableName table : tables) { - try { - expireSnapshotsForTable(getIcebergTable(table, msc)); - } catch (Exception e) { - LOG.error("Exception while running iceberg expiry service on catalog/db/table: {}/{}/{}", - catalogName, dbPattern, tablePattern, e); - } + for (org.apache.hadoop.hive.metastore.api.Table table : tables) { + expireSnapshotsForTable(getIcebergTable(table)); } } catch (Exception e) { throw new RuntimeException("Error while getting tables from metastore", e); } } - private Table getIcebergTable(TableName tableName, IMetaStoreClient msc) { - return tableCache.get(tableName, key -> { - LOG.debug("Getting iceberg table from metastore as it's not present in table cache: {}", tableName); - GetTableRequest request = new GetTableRequest(tableName.getDb(), tableName.getTable()); - try { - return IcebergTableUtil.getTable(conf, msc.getTable(request)); - } catch (TException e) { - throw new RuntimeException(e); - } - }); + private Table getIcebergTable(org.apache.hadoop.hive.metastore.api.Table table) { + return tableCache.get(table, key -> IcebergTableUtil.getTable(conf, table)); Review Comment: You're right. I didn't notice that the `table` object is needed anyway on a cache miss; I assumed we could create `org.apache.iceberg.Table` from a metadata.json. ``` tableCache.get(table, key -> IcebergTableUtil.getTable(conf, table)) ``` Or we might replace it with like this to reduce the memory pressure and speed up comparisons. I don't care too much. ``` org.apache.hadoop.hive.metastore.api.Table table = ??? String(or something better) cacheKey = getCacheKey(table); tableCache.get(cacheKey, ignored -> IcebergTableUtil.getTable(conf, table)) ``` > we don't need advanced caching here. we use iceberg table ref to call the expire snapshots. cache TTL is 1 day, do you some issue with that? I assume we should expire snapshots on as up-to-date `org.apache.iceberg.Table` as possible to include all metadata.json and avoid conflicts(as ExpireSnapshot updates the latest metadata, I assume an operation against an obsolete Iceberg table might cause one additional retry); It means the current one seems to be better than the previous one that uses an table identifier as a cache key. https://github.com/apache/iceberg/blob/apache-iceberg-1.9.2/api/src/main/java/org/apache/iceberg/ExpireSnapshots.java#L31-L32 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org