[
https://issues.apache.org/jira/browse/HUDI-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-3029:
--------------------------------------
Status: Resolved (was: Patch Available)
> TransactionManager synchronized begin/endTransaction() leading to deadlock
> ---------------------------------------------------------------------------
>
> Key: HUDI-3029
> URL: https://issues.apache.org/jira/browse/HUDI-3029
> Project: Apache Hudi
> Issue Type: Task
> Components: Writer Core
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I see the TransactionManager has begin and end transactions as synchronized
> methods. Based on the lock provider implementation, this can have adverse
> effects. Say the lock provider has the blocking call for the lock() or
> tryLock() (which is genereally the case), then the following sequence will
> lead to a deadlock.
> Client 1: beginTransaction() => txn manager instance lock acquired, lock()
> went through, instance lock released
> Client 2: beginTransaction() => txn manager instance lock acquired, lock() is
> blocking
> Cilent 1: endTransaction() => Waiting to lock the txn manager instance to
> enter the synchronized method
>
>
> {noformat}
> public synchronized void beginTransaction(Option<HoodieInstant>
> currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
> if (supportsOptimisticConcurrency) {
> this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
> lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
> LOG.info("Latest completed transaction instant " +
> lastCompletedTxnOwnerInstant);
> this.currentTxnOwnerInstant = currentTxnOwnerInstant;
> LOG.info("Transaction starting with transaction owner " +
> currentTxnOwnerInstant);
> lockManager.lock();
> LOG.info("Transaction started");
> }
> }
> public synchronized void endTransaction() {
> if (supportsOptimisticConcurrency) {
> LOG.info("Transaction ending with transaction owner " +
> currentTxnOwnerInstant);
> lockManager.unlock();
> LOG.info("Transaction ended");
> this.lastCompletedTxnOwnerInstant = Option.empty();
> lockManager.resetLatestCompletedWriteInstant();
> }
> }{noformat}
>
>
> The reason why it may be working with the current model is when the lock
> provider implementation of tryLock() has sleep() or retry with timeout etc.,
> But, we can't assume on the lock provider implementation at the transaction
> manager layer.
>
> cc: [~nishith29] [~shivnarayan]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)