[
https://issues.apache.org/jira/browse/HUDI-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manoj Govindassamy updated HUDI-3029:
-------------------------------------
Description:
I see the TransactionManager has begin and end transactions as synchronized
methods. Based on the lock provider implementation, this can have adverse
effects. Say the lock provider has the blocking call for the lock() or
tryLock() (which is genereally the case), then the following sequence will lead
to a deadlock.
Client 1: beginTransaction() => txn manager instance lock acquired, lock()
went through, instance lock released
Client 2: beginTransaction() => txn manager instance lock acquired, lock() is
blocking
Cilent 1: endTransaction() => Waiting to lock the txn manager instance to enter
the synchronized method
{noformat}
public synchronized void beginTransaction(Option<HoodieInstant>
currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
if (supportsOptimisticConcurrency) {
this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
LOG.info("Latest completed transaction instant " +
lastCompletedTxnOwnerInstant);
this.currentTxnOwnerInstant = currentTxnOwnerInstant;
LOG.info("Transaction starting with transaction owner " +
currentTxnOwnerInstant);
lockManager.lock();
LOG.info("Transaction started");
}
}
public synchronized void endTransaction() {
if (supportsOptimisticConcurrency) {
LOG.info("Transaction ending with transaction owner " +
currentTxnOwnerInstant);
lockManager.unlock();
LOG.info("Transaction ended");
this.lastCompletedTxnOwnerInstant = Option.empty();
lockManager.resetLatestCompletedWriteInstant();
}
}{noformat}
The reason why it may be working with the current model is when the lock
provider implementation of tryLock() has sleep() or retry with timeout etc.,
But, we can't assume on the lock provider implementation at the transaction
manager layer.
cc: [~nishith29] [~shivnarayan]
was:
I see the TransactionManager has begin and end transactions as synchronized
methods. Based on the lock provider implementation, this can have adverse
effects. Say the lock provider has the blocking call for the lock() or
tryLock() (which is genereally the case), then the following sequence will lead
to a deadlock.
Client 1: beginTransaction() => txn manager instance lock acquired, lock()
went through, instance lock released
Client 2: beginTransaction() => txn manager instance lock acquired, lock() is
blocking
Cilent 3: endTransaction() => Waiting to lock the txn manager instance to enter
the synchronized method
{noformat}
public synchronized void beginTransaction(Option<HoodieInstant>
currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
if (supportsOptimisticConcurrency) {
this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
LOG.info("Latest completed transaction instant " +
lastCompletedTxnOwnerInstant);
this.currentTxnOwnerInstant = currentTxnOwnerInstant;
LOG.info("Transaction starting with transaction owner " +
currentTxnOwnerInstant);
lockManager.lock();
LOG.info("Transaction started");
}
}
public synchronized void endTransaction() {
if (supportsOptimisticConcurrency) {
LOG.info("Transaction ending with transaction owner " +
currentTxnOwnerInstant);
lockManager.unlock();
LOG.info("Transaction ended");
this.lastCompletedTxnOwnerInstant = Option.empty();
lockManager.resetLatestCompletedWriteInstant();
}
}{noformat}
The reason why it may be working with the current model is when the lock
provider implementation of tryLock() has sleep() or retry with timeout etc.,
But, we can't assume on the lock provider implementation at the transaction
manager layer.
cc: [~nishith29] [~shivnarayan]
> TransactionManager synchronized begin/endTransaction() leading to deadlock
> ---------------------------------------------------------------------------
>
> Key: HUDI-3029
> URL: https://issues.apache.org/jira/browse/HUDI-3029
> Project: Apache Hudi
> Issue Type: Task
> Components: Writer Core
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I see the TransactionManager has begin and end transactions as synchronized
> methods. Based on the lock provider implementation, this can have adverse
> effects. Say the lock provider has the blocking call for the lock() or
> tryLock() (which is genereally the case), then the following sequence will
> lead to a deadlock.
> Client 1: beginTransaction() => txn manager instance lock acquired, lock()
> went through, instance lock released
> Client 2: beginTransaction() => txn manager instance lock acquired, lock() is
> blocking
> Cilent 1: endTransaction() => Waiting to lock the txn manager instance to
> enter the synchronized method
>
>
> {noformat}
> public synchronized void beginTransaction(Option<HoodieInstant>
> currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
> if (supportsOptimisticConcurrency) {
> this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
> lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
> LOG.info("Latest completed transaction instant " +
> lastCompletedTxnOwnerInstant);
> this.currentTxnOwnerInstant = currentTxnOwnerInstant;
> LOG.info("Transaction starting with transaction owner " +
> currentTxnOwnerInstant);
> lockManager.lock();
> LOG.info("Transaction started");
> }
> }
> public synchronized void endTransaction() {
> if (supportsOptimisticConcurrency) {
> LOG.info("Transaction ending with transaction owner " +
> currentTxnOwnerInstant);
> lockManager.unlock();
> LOG.info("Transaction ended");
> this.lastCompletedTxnOwnerInstant = Option.empty();
> lockManager.resetLatestCompletedWriteInstant();
> }
> }{noformat}
>
>
> The reason why it may be working with the current model is when the lock
> provider implementation of tryLock() has sleep() or retry with timeout etc.,
> But, we can't assume on the lock provider implementation at the transaction
> manager layer.
>
> cc: [~nishith29] [~shivnarayan]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)