[ 
https://issues.apache.org/jira/browse/HUDI-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3029:
--------------------------------------
    Status: Resolved  (was: Patch Available)

> TransactionManager synchronized begin/endTransaction() leading to deadlock 
> ---------------------------------------------------------------------------
>
>                 Key: HUDI-3029
>                 URL: https://issues.apache.org/jira/browse/HUDI-3029
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: Writer Core
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> I see the TransactionManager has begin and end transactions as synchronized 
> methods. Based on the lock provider implementation, this can have adverse 
> effects. Say the lock provider has the blocking call for the lock() or 
> tryLock() (which is genereally the case), then the following sequence will 
> lead to a deadlock.
> Client 1: beginTransaction() => txn manager instance lock acquired,  lock() 
> went through, instance lock released
> Client 2: beginTransaction() => txn manager instance lock acquired, lock() is 
> blocking 
> Cilent 1: endTransaction() => Waiting to lock the txn manager instance to 
> enter the synchronized method
>  
>  
> {noformat}
> public synchronized void beginTransaction(Option<HoodieInstant> 
> currentTxnOwnerInstant, Option<HoodieInstant> lastCompletedTxnOwnerInstant) {
>   if (supportsOptimisticConcurrency) {
>     this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant;
>     lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant);
>     LOG.info("Latest completed transaction instant " + 
> lastCompletedTxnOwnerInstant);
>     this.currentTxnOwnerInstant = currentTxnOwnerInstant;
>     LOG.info("Transaction starting with transaction owner " + 
> currentTxnOwnerInstant);
>     lockManager.lock();
>     LOG.info("Transaction started");
>   }
> }
> public synchronized void endTransaction() {
>   if (supportsOptimisticConcurrency) {
>     LOG.info("Transaction ending with transaction owner " + 
> currentTxnOwnerInstant);
>     lockManager.unlock();
>     LOG.info("Transaction ended");
>     this.lastCompletedTxnOwnerInstant = Option.empty();
>     lockManager.resetLatestCompletedWriteInstant();
>   }
> }{noformat}
>  
>  
> The reason why it may be working with the current model is when the lock 
> provider implementation of tryLock() has sleep() or retry with timeout etc., 
> But, we can't assume on the lock provider implementation at the transaction 
> manager layer.
>  
> cc: [~nishith29]  [~shivnarayan] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to