abstractdog commented on code in PR #5666: URL: https://github.com/apache/hive/pull/5666#discussion_r1989772599
########## ql/src/java/org/apache/hadoop/hive/ql/queryhistory/repository/IcebergRepository.java: ########## @@ -176,4 +193,13 @@ private void prepareConfForWrite() { SessionStateUtil.addCommitInfo(SessionState.getSessionConf(), tableDesc.getTableName(), jobId, 1, Maps.fromProperties(tableDesc.getProperties())); } + + private void expireSnapshots() { Review Comment: took a look, my concern about this whole service is exactly the doTheMagic part: I remember we faced lots of problems with e.g. the PartitionManagementTask because it couldn't have been run in a lightweight fashion, and the common thing is that we need to iterate on all tables of all DBs to achieve what we want here: the best case scenario is that we implement a filter on ICEBERG tables (like: [listTableNamesByFilter](https://github.com/apache/hive/blob/4108ec495399800a1fe3f982697009944b03c8f2/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java#L432C16-L432C38)), kinda pushing down the filter to backend db but even if we do that, we still need lots of code to make it as optimal as possible, just take a look at how [complicated](https://github.com/apache/hive/blob/4108ec495399800a1fe3f982697009944b03c8f2/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L98-L140) the partition management task filtering logic is to achieve the same). So looking at the implementations so far, the "getting all iceberg tables" is the very part I'm worried about, and makes me wonder I really want to go towards that direction just the sake of this query history snapshot expiry, which is otherwise a very simple code Focusing on the motivation, which is to handle the query history table—fully loaded and maintained by Hive—we owe it to users to optimize it as much as possible. However, the question is whether we should take on the additional burden in this PR to handle all Iceberg tables visible in the metastore, or keep the scope focused for now. Or maybe the same db and table name filter, but we default it to: "sys" and "query_history" in MetastoreConf? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org