Neer393 commented on code in PR #6020: URL: https://github.com/apache/hive/pull/6020#discussion_r2266212049
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java: ########## @@ -79,35 +76,20 @@ public void run() { private void expireTables(String catalogName, String dbPattern, String tablePattern) { try (IMetaStoreClient msc = new HiveMetaStoreClient(conf)) { - // TODO: HIVE-28952 – modify TableFetcher to return HMS Table API objects directly, - // avoiding the need for subsequent msc.getTable calls to fetch each matched table individually - List<TableName> tables = IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, tablePattern).getTables(); - + int maxBatchSize = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX); + List<org.apache.hadoop.hive.metastore.api.Table> tables = + IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern, tablePattern).getTables(maxBatchSize); LOG.debug("{} candidate tables found", tables.size()); - - for (TableName table : tables) { - try { - expireSnapshotsForTable(getIcebergTable(table, msc)); - } catch (Exception e) { - LOG.error("Exception while running iceberg expiry service on catalog/db/table: {}/{}/{}", - catalogName, dbPattern, tablePattern, e); - } + for (org.apache.hadoop.hive.metastore.api.Table table : tables) { + expireSnapshotsForTable(getIcebergTable(table)); } } catch (Exception e) { throw new RuntimeException("Error while getting tables from metastore", e); } } - private Table getIcebergTable(TableName tableName, IMetaStoreClient msc) { - return tableCache.get(tableName, key -> { - LOG.debug("Getting iceberg table from metastore as it's not present in table cache: {}", tableName); - GetTableRequest request = new GetTableRequest(tableName.getDb(), tableName.getTable()); - try { - return IcebergTableUtil.getTable(conf, msc.getTable(request)); - } catch (TException e) { - throw new RuntimeException(e); - } - }); + private Table getIcebergTable(org.apache.hadoop.hive.metastore.api.Table table) { + return tableCache.get(table, key -> IcebergTableUtil.getTable(conf, table)); Review Comment: > I'd also suspect the original `TableName` might not be sufficient because the table object might have been updated, or it might not be an Iceberg table already I have a solution to this. What if we keep the current mapping same but along with this, we add a new field which indicates the timestamp this was added to cache. So if we want to know if the Iceberg Table in cache is stale or not, what we would do is compare the timestamps i.e the lastDDLTimestamp of the table and the cache timestamp. If the lastDDLTimestamp is greater, we would not return the cache object but instead force fetch the new IcebergTable. How does this sound @okumin ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org