[ 
https://issues.apache.org/jira/browse/HUDI-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HUDI-3029:
-------------------------------------
    Description: 
I see the TransactionManager has begin and end transactions as synchronized 
methods. Based on the lock provider implementation, this can have adverse 
effects. Say the lock provider has the blocking call for the lock() or 
tryLock() (which is genereally the case), then the following sequence will lead 
to a deadlock.

Client 1: beginTransaction() => txn manager instance lock acquired,  lock() 
went through, instance lock released

Client 2: beginTransaction() => txn manager instance lock acquired, lock() is 
blocking 

Cilent 1: endTransaction() => Waiting to lock the txn manager instance to enter 
the synchronized method

 

 
{noformat}
public synchronized void beginTransaction(Option<HoodieInstant> 
currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
  if (supportsOptimisticConcurrency) {
    this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
    lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
    LOG.info("Latest completed transaction instant " + 
lastCompletedTxnOwnerInstant);
    this.currentTxnOwnerInstant = currentTxnOwnerInstant;
    LOG.info("Transaction starting with transaction owner " + 
currentTxnOwnerInstant);
    lockManager.lock();
    LOG.info("Transaction started");
  }
}

public synchronized void endTransaction() {
  if (supportsOptimisticConcurrency) {
    LOG.info("Transaction ending with transaction owner " + 
currentTxnOwnerInstant);
    lockManager.unlock();
    LOG.info("Transaction ended");
    this.lastCompletedTxnOwnerInstant = Option.empty();
    lockManager.resetLatestCompletedWriteInstant();
  }
}{noformat}
 

 

The reason why it may be working with the current model is when the lock 
provider implementation of tryLock() has sleep() or retry with timeout etc., 
But, we can't assume on the lock provider implementation at the transaction 
manager layer.

 

cc: [~nishith29]  [~shivnarayan] 

  was:
I see the TransactionManager has begin and end transactions as synchronized 
methods. Based on the lock provider implementation, this can have adverse 
effects. Say the lock provider has the blocking call for the lock() or 
tryLock() (which is genereally the case), then the following sequence will lead 
to a deadlock.

Client 1: beginTransaction() => txn manager instance lock acquired,  lock() 
went through, instance lock released

Client 2: beginTransaction() => txn manager instance lock acquired, lock() is 
blocking 

Cilent 3: endTransaction() => Waiting to lock the txn manager instance to enter 
the synchronized method

 

 
{noformat}
public synchronized void beginTransaction(Option<HoodieInstant> 
currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
  if (supportsOptimisticConcurrency) {
    this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
    lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
    LOG.info("Latest completed transaction instant " + 
lastCompletedTxnOwnerInstant);
    this.currentTxnOwnerInstant = currentTxnOwnerInstant;
    LOG.info("Transaction starting with transaction owner " + 
currentTxnOwnerInstant);
    lockManager.lock();
    LOG.info("Transaction started");
  }
}

public synchronized void endTransaction() {
  if (supportsOptimisticConcurrency) {
    LOG.info("Transaction ending with transaction owner " + 
currentTxnOwnerInstant);
    lockManager.unlock();
    LOG.info("Transaction ended");
    this.lastCompletedTxnOwnerInstant = Option.empty();
    lockManager.resetLatestCompletedWriteInstant();
  }
}{noformat}
 

 

The reason why it may be working with the current model is when the lock 
provider implementation of tryLock() has sleep() or retry with timeout etc., 
But, we can't assume on the lock provider implementation at the transaction 
manager layer.

 

cc: [~nishith29]  [~shivnarayan] 


> TransactionManager synchronized begin/endTransaction() leading to deadlock 
> ---------------------------------------------------------------------------
>
>                 Key: HUDI-3029
>                 URL: https://issues.apache.org/jira/browse/HUDI-3029
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: Writer Core
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> I see the TransactionManager has begin and end transactions as synchronized 
> methods. Based on the lock provider implementation, this can have adverse 
> effects. Say the lock provider has the blocking call for the lock() or 
> tryLock() (which is genereally the case), then the following sequence will 
> lead to a deadlock.
> Client 1: beginTransaction() => txn manager instance lock acquired,  lock() 
> went through, instance lock released
> Client 2: beginTransaction() => txn manager instance lock acquired, lock() is 
> blocking 
> Cilent 1: endTransaction() => Waiting to lock the txn manager instance to 
> enter the synchronized method
>  
>  
> {noformat}
> public synchronized void beginTransaction(Option<HoodieInstant> 
> currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
>   if (supportsOptimisticConcurrency) {
>     this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
>     lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
>     LOG.info("Latest completed transaction instant " + 
> lastCompletedTxnOwnerInstant);
>     this.currentTxnOwnerInstant = currentTxnOwnerInstant;
>     LOG.info("Transaction starting with transaction owner " + 
> currentTxnOwnerInstant);
>     lockManager.lock();
>     LOG.info("Transaction started");
>   }
> }
> public synchronized void endTransaction() {
>   if (supportsOptimisticConcurrency) {
>     LOG.info("Transaction ending with transaction owner " + 
> currentTxnOwnerInstant);
>     lockManager.unlock();
>     LOG.info("Transaction ended");
>     this.lastCompletedTxnOwnerInstant = Option.empty();
>     lockManager.resetLatestCompletedWriteInstant();
>   }
> }{noformat}
>  
>  
> The reason why it may be working with the current model is when the lock 
> provider implementation of tryLock() has sleep() or retry with timeout etc., 
> But, we can't assume on the lock provider implementation at the transaction 
> manager layer.
>  
> cc: [~nishith29]  [~shivnarayan] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to