Hi all,

I am currently working on HIVE-29512
<https://issues.apache.org/jira/browse/HIVE-29512> jira and would like to
get the community’s feedback on the proposed approach.

The HMS backend tables COMPACTION_QUEUE and COMPLETED_COMPACTIONS are used
for both HIVE acid table and Iceberg table compaction flows. However,
cleanup of these records today happens only when a Hive ACID table or
partition is dropped.

For Iceberg tables, it seems straightforward to extend cleanup upon drop
table at -
 
https://github.com/apache/hive/blob/3ddcdf4a0221127278e258c74fcd96159c3cb41e/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/AcidEventListener.java#L78
<https://github.com/apache/hive/blob/3ddcdf4a0221127278e258c74fcd96159c3cb41e/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/AcidEventListener.java#L78>

However, the partition case is more complex. HMS partition APIs(such as add
partitions, drop partitions etc) are not invoked by HS2 for Iceberg tables,
so the existing cleanup hooks are not triggered.

One approach I am considering is introducing a new HMS API to explicitly
clean up compaction metadata for partitions. This API could be invoked by
clients when Iceberg partitions are dropped, thereby delegating the
responsibility to the clients.

I would appreciate your thoughts on:

   - Whether introducing a new HMS API is the right approach
   - Alternative designs to handle partition-level cleanup

Thanks in advance for your feedback.

Regards,

Venu

Reply via email to