Hi Team, I have seen several complaints when HiveCatalog was used and the locks were not removed correctly [1], [2] and suggested solutions [3]. Also did my share to fix the situation as much as possible [4]. The issue is that every fix is just a band-aid when the client can disappear and leave behind an abandoned lock, this way preventing further changes on the given Iceberg table.
To fix this I propose the following solution: 1. We can provide a minimal Hive feature to check the previous metadata location transactionally before altering a table and setting the new snapshot 2. We can have a configuration to turn off using the Hive locks in the HiveCatalog and we can rely on the new Hive feature to provide us the atomicity which is needed for an Iceberg commit I have created the above mentioned minimal Hive feature request (HIVE-26882 [5]) and the PR to fix it [6] also, with the approval of the Hive community backported and merged this feature to the active branches: branch-3 [7], branch-3.1 [8], branch-2 [9], branch-2.3 [10]. This means that any of the next Hive releases will contain this fix, and any company having their own forks could easily cherry-pick this minimal change. The first upcoming upstream Hive release with this feature would be the Hive 3.2.0 which is expected to come around 2-3 months [11]. Also I have created the Iceberg PR to start using the new feature [12]. This has 2 parts: 1. Rebased Adam Szita's work [13] which refactored out the Lock related features on a single class 2. Added a small patch which adds the possibility to use the new feature and turn of the locking [14] for HiveCatalog tables Using the new feature would be beneficial in multiple ways: - Enhances stability of the writers - Improves the performance of the commits, as fewer number of HMS calls are needed for a commit operation My question for the community is: - Do we want to merge both PRs to the Iceberg code, or do we want to merge only the refactor PR (1) to the Iceberg code and only merge the PR with the new configuration (2) after there is an Apache Hive release with the new Hive feature? I would vote for merging the configuration PR too because of the following reasons: - Many companies are using their own fork of Hive - they do not need to wait for the Apache release to add a new feature - The PR itself is small, and by default it is turned off - will not cause to many issues with maintaining the code - We will not forget to add the change when the Apache Hive is released What are your thoughts? If you have time, please review the PR. Thanks, Peter [1] Should do the best efforts to release hive table lock #3336 - https://github.com/apache/iceberg/pull/3336 [2] Lock remains in HMS if HiveTableOperations gets killed (direct process shutdown - no signals) after lock is acquired #2301 - https://github.com/apache/iceberg/pull/2301 [3] What is the purpose of Hive Lock ? #6370 - https://github.com/apache/iceberg/issues/6370 [4] Hive: Lock hardening #6451 - https://github.com/apache/iceberg/pull/6451 [5] HIVE-26882: Allow transactional check of Table parameter before altering the Table - https://issues.apache.org/jira/browse/HIVE-26882 [6] master/HIVE-26882: https://github.com/apache/hive/pull/3888 [7] branch-3/HIVE-26882: https://github.com/apache/hive/pull/3943 [8] branch-3.1/HIVE-26882: : https://github.com/apache/hive/pull/3944 [9] branch-2/HIVE-26882: : https://github.com/apache/hive/pull/3946 [10] branch-2.1/HIVE-26882: : https://github.com/apache/hive/pull/3947 [11] Upcoming Hive release: https://issues.apache.org/jira/browse/HIVE-26882?focusedCommentId=17677002&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17677002 [12] Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 - https://github.com/apache/iceberg/pull/6570 [13] Refactor commit lock mechanism from HiveTableOperations #5877 - https://github.com/apache/iceberg/pull/5877 [14] Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 - https://github.com/apache/iceberg/pull/6570/commits/dd38a44fb4aa9c6423a7474e9d99258b38a2e609