abstractdog commented on code in PR #5666:
URL: https://github.com/apache/hive/pull/5666#discussion_r1989772599


##########
ql/src/java/org/apache/hadoop/hive/ql/queryhistory/repository/IcebergRepository.java:
##########
@@ -176,4 +193,13 @@ private void prepareConfForWrite() {
     SessionStateUtil.addCommitInfo(SessionState.getSessionConf(), 
tableDesc.getTableName(), jobId, 1,
         Maps.fromProperties(tableDesc.getProperties()));
   }
+
+  private void expireSnapshots() {

Review Comment:
   took a look, my concern about this whole service is exactly the doTheMagic 
part: I remember we faced lots of problems with e.g. the 
PartitionManagementTask because it couldn't have been run in a lightweight 
fashion, and the common thing is that we need to iterate on all tables of all 
DBs to achieve what we want here: the best case scenario is that we implement a 
filter on ICEBERG tables (like: 
[listTableNamesByFilter](https://github.com/apache/hive/blob/4108ec495399800a1fe3f982697009944b03c8f2/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java#L432C16-L432C38)),
 kinda pushing down the filter to backend db
   but even if we do that, we still need lots of code to make it as optimal as 
possible, just take a look at how 
[complicated](https://github.com/apache/hive/blob/4108ec495399800a1fe3f982697009944b03c8f2/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L98-L140)
 the partition management task filtering logic is to achieve the same).
   So looking at the implementations so far, the "getting all iceberg tables" 
is the very part I'm worried about, and makes me wonder I really want to go 
towards that direction just the sake of this query history snapshot expiry, 
which is otherwise a very simple code
   
   Focusing on the motivation, which is to handle the query history table—fully 
loaded and maintained by Hive—we owe it to users to optimize it as much as 
possible. However, the question is whether we should take on the additional 
burden in this PR to handle all Iceberg tables visible in the metastore, or 
keep the scope focused for now.
   
   Or maybe the same db + table name filter, but we default it to: "sys" and 
"query_history"?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to