Manoj Govindassamy created HUDI-3029:
----------------------------------------

             Summary: TransactionManager synchronized begin/endTransaction() 
leading to deadlock 
                 Key: HUDI-3029
                 URL: https://issues.apache.org/jira/browse/HUDI-3029
             Project: Apache Hudi
          Issue Type: Task
          Components: Writer Core
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy
             Fix For: 0.11.0


I see the TransactionManager has begin and end transactions as synchronized 
methods. Based on the lock provider implementation, this can have adverse 
effects. Say the lock provider has the blocking call for the lock() or 
tryLock() (which is genereally the case), then the following sequence will lead 
to a deadlock.

Client 1: beginTransaction() => txn manager instance lock acquired,  lock() 
went through, instance lock released

Client 2: beginTransaction() => txn manager instance lock acquired, lock() is 
blocking 

Cilent 3: endTransaction() => Waiting to lock the txn manager instance to enter 
the synchronized method

 

 
{noformat}
public synchronized void beginTransaction(Option<HoodieInstant> 
currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
  if (supportsOptimisticConcurrency) {
    this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
    lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
    LOG.info("Latest completed transaction instant " + 
lastCompletedTxnOwnerInstant);
    this.currentTxnOwnerInstant = currentTxnOwnerInstant;
    LOG.info("Transaction starting with transaction owner " + 
currentTxnOwnerInstant);
    lockManager.lock();
    LOG.info("Transaction started");
  }
}

public synchronized void endTransaction() {
  if (supportsOptimisticConcurrency) {
    LOG.info("Transaction ending with transaction owner " + 
currentTxnOwnerInstant);
    lockManager.unlock();
    LOG.info("Transaction ended");
    this.lastCompletedTxnOwnerInstant = Option.empty();
    lockManager.resetLatestCompletedWriteInstant();
  }
}{noformat}
 

 

The reason why it may be working with the current model is when the lock 
provider implementation of tryLock() has sleep() or retry with timeout etc., 
But, we can't assume on the lock provider implementation at the transaction 
manager layer.

 

cc: [~nishith29]  [~shivnarayan] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to