Hi Team,

I have seen several complaints when HiveCatalog was used and the locks were
not removed correctly [1], [2] and suggested solutions [3]. Also did my
share to fix the situation as much as possible [4]. The issue is that every
fix is just a band-aid when the client can disappear and leave behind an
abandoned lock, this way preventing further changes on the given Iceberg
table.

To fix this I propose the following solution:
1. We can provide a minimal Hive feature to check the previous metadata
location transactionally before altering a table and setting the new
snapshot
2. We can have a configuration to turn off using the Hive locks in the
HiveCatalog and we can rely on the new Hive feature to provide us the
atomicity which is needed for an Iceberg commit

I have created the above mentioned minimal Hive feature request (HIVE-26882
[5]) and the PR to fix it [6] also, with the approval of the Hive community
backported and merged this feature to the active branches: branch-3 [7],
branch-3.1 [8], branch-2 [9], branch-2.3 [10]. This means that any of the
next Hive releases will contain this fix, and any company having their own
forks could easily cherry-pick this minimal change. The first upcoming
upstream Hive release with this feature would be the Hive 3.2.0 which is
expected to come around 2-3 months [11].

Also I have created the Iceberg PR to start using the new feature [12].
This has 2 parts:
1. Rebased Adam Szita's work [13] which refactored out the Lock related
features on a single class
2. Added a small patch which adds the possibility to use the new feature
and turn of the locking [14] for HiveCatalog tables

Using the new feature would be beneficial in multiple ways:
- Enhances stability of the writers
- Improves the performance of the commits, as fewer number of HMS calls are
needed for a commit operation

My question for the community is:
- Do we want to merge both PRs to the Iceberg code, or do we want to merge
only the refactor PR (1) to the Iceberg code and only merge the PR with the
new configuration (2) after there is an Apache Hive release with the new
Hive feature?

I would vote for merging the configuration PR too because of the following
reasons:
- Many companies are using their own fork of Hive - they do not need to
wait for the Apache release to add a new feature
- The PR itself is small, and by default it is turned off - will not cause
to many issues with maintaining the code
- We will not forget to add the change when the Apache Hive is released

What are your thoughts?

If you have time, please review the PR.

Thanks,
Peter

[1] Should do the best efforts to release hive table lock #3336 -
https://github.com/apache/iceberg/pull/3336
[2] Lock remains in HMS if HiveTableOperations gets killed (direct process
shutdown - no signals) after lock is acquired  #2301 -
https://github.com/apache/iceberg/pull/2301
[3] What is the purpose of Hive Lock ? #6370 -
https://github.com/apache/iceberg/issues/6370
[4] Hive: Lock hardening #6451 - https://github.com/apache/iceberg/pull/6451
[5] HIVE-26882: Allow transactional check of Table parameter before
altering the Table - https://issues.apache.org/jira/browse/HIVE-26882
[6] master/HIVE-26882: https://github.com/apache/hive/pull/3888

[7] branch-3/HIVE-26882: https://github.com/apache/hive/pull/3943

[8] branch-3.1/HIVE-26882: : https://github.com/apache/hive/pull/3944

[9] branch-2/HIVE-26882: : https://github.com/apache/hive/pull/3946

[10] branch-2.1/HIVE-26882: : https://github.com/apache/hive/pull/3947

[11] Upcoming Hive release:
https://issues.apache.org/jira/browse/HIVE-26882?focusedCommentId=17677002&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17677002

[12] Hive: Use EnvironmentContext instead of Hive Locks to provide
transactional commits after HIVE-26882 -
https://github.com/apache/iceberg/pull/6570

[13] Refactor commit lock mechanism from HiveTableOperations #5877 -
https://github.com/apache/iceberg/pull/5877

[14] Hive: Use EnvironmentContext instead of Hive Locks to provide
transactional
commits after HIVE-26882 -
https://github.com/apache/iceberg/pull/6570/commits/dd38a44fb4aa9c6423a7474e9d99258b38a2e609

Reply via email to