Hi all, I am currently working on HIVE-29512 <https://issues.apache.org/jira/browse/HIVE-29512> jira and would like to get the community’s feedback on the proposed approach.
The HMS backend tables COMPACTION_QUEUE and COMPLETED_COMPACTIONS are used for both HIVE acid table and Iceberg table compaction flows. However, cleanup of these records today happens only when a Hive ACID table or partition is dropped. For Iceberg tables, it seems straightforward to extend cleanup upon drop table at - https://github.com/apache/hive/blob/3ddcdf4a0221127278e258c74fcd96159c3cb41e/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/AcidEventListener.java#L78 <https://github.com/apache/hive/blob/3ddcdf4a0221127278e258c74fcd96159c3cb41e/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/AcidEventListener.java#L78> However, the partition case is more complex. HMS partition APIs(such as add partitions, drop partitions etc) are not invoked by HS2 for Iceberg tables, so the existing cleanup hooks are not triggered. One approach I am considering is introducing a new HMS API to explicitly clean up compaction metadata for partitions. This API could be invoked by clients when Iceberg partitions are dropped, thereby delegating the responsibility to the clients. I would appreciate your thoughts on: - Whether introducing a new HMS API is the right approach - Alternative designs to handle partition-level cleanup Thanks in advance for your feedback. Regards, Venu
