okumin commented on code in PR #6020:
URL: https://github.com/apache/hive/pull/6020#discussion_r2268766025
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java:
##########
@@ -79,35 +76,20 @@ public void run() {
private void expireTables(String catalogName, String dbPattern, String
tablePattern) {
try (IMetaStoreClient msc = new HiveMetaStoreClient(conf)) {
- // TODO: HIVE-28952 – modify TableFetcher to return HMS Table API
objects directly,
- // avoiding the need for subsequent msc.getTable calls to fetch each
matched table individually
- List<TableName> tables = IcebergTableUtil.getTableFetcher(msc,
catalogName, dbPattern, tablePattern).getTables();
-
+ int maxBatchSize = MetastoreConf.getIntVar(conf,
MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX);
+ List<org.apache.hadoop.hive.metastore.api.Table> tables =
+ IcebergTableUtil.getTableFetcher(msc, catalogName, dbPattern,
tablePattern).getTables(maxBatchSize);
LOG.debug("{} candidate tables found", tables.size());
-
- for (TableName table : tables) {
- try {
- expireSnapshotsForTable(getIcebergTable(table, msc));
- } catch (Exception e) {
- LOG.error("Exception while running iceberg expiry service on
catalog/db/table: {}/{}/{}",
- catalogName, dbPattern, tablePattern, e);
- }
+ for (org.apache.hadoop.hive.metastore.api.Table table : tables) {
+ expireSnapshotsForTable(getIcebergTable(table));
}
} catch (Exception e) {
throw new RuntimeException("Error while getting tables from metastore",
e);
}
}
- private Table getIcebergTable(TableName tableName, IMetaStoreClient msc) {
- return tableCache.get(tableName, key -> {
- LOG.debug("Getting iceberg table from metastore as it's not present in
table cache: {}", tableName);
- GetTableRequest request = new GetTableRequest(tableName.getDb(),
tableName.getTable());
- try {
- return IcebergTableUtil.getTable(conf, msc.getTable(request));
- } catch (TException e) {
- throw new RuntimeException(e);
- }
- });
+ private Table getIcebergTable(org.apache.hadoop.hive.metastore.api.Table
table) {
+ return tableCache.get(table, key -> IcebergTableUtil.getTable(conf,
table));
Review Comment:
You're right. I didn't notice that the `table` object is needed anyway on a
cache miss; I assumed we could create `org.apache.iceberg.Table` from a
metadata.json.
```
tableCache.get(table, key -> IcebergTableUtil.getTable(conf, table))
```
> we don't need advanced caching here. we use iceberg table ref to call the
expire snapshots. cache TTL is 1 day, do you some issue with that?
I assume we should expire snapshots on as up-to-date
`org.apache.iceberg.Table` as possible to include all metadata.json and avoid
conflicts(as ExpireSnapshot updates the latest metadata, I assume an operation
against an obsolete Iceberg table might cause one additional retry); It means
the current one seems to be better than the previous one.
https://github.com/apache/iceberg/blob/apache-iceberg-1.9.2/api/src/main/java/org/apache/iceberg/ExpireSnapshots.java#L31-L32
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]