[ 
https://issues.apache.org/jira/browse/IGNITE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin reassigned IGNITE-22980:
--------------------------------------------

    Assignee: Vladislav Pyatkov

> Lock manager may fail and lock waiter simultaneously
> ----------------------------------------------------
>
>                 Key: IGNITE-22980
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22980
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Assignee: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> The behavior was hardly predicted or planned. But currently, we can acquire a 
> lock:
> {code:java}
>         private void lock() {
>             lockMode = intendedLockMode;
>             intendedLockMode = null;
>             intendedLocks.clear();
>         }
> {code}
> and made the waiter fail:
> {code:java}
>         private void fail(LockException e) {
>             ex = e;
>         }
> {code}
> without limitation (assertion checking or explicitly prohibition).
> Scenario:
>  * tx1 tries to acquire a lock and finds conflicting transaction tx2;
>  * lock manager tries to check the state and coordinator of tx2;
>  * coordinator of tx2 has left, so TxRecoveryMessage is sent;
>  * the primary replica of commit partition of tx2 is on the same node, so 
> TxRecoveryMessage is sent locally. It also triggers the tx recovery, so tx2 
> is finished and tx cleanup is performed locally. All of this happens in the 
> same thread, and during txn cleanup the locks of tx2 are released;
>  * the release of locks of tx2 allows the conflicting waiter of tx1 to 
> acquire a lock;
>  * the processing of conflicting transaction continues and #fail is called on 
> the same waiter.
> There is also another problem: tx recovery shouldn't happen within 
> synchronized block of HeapLockManager. It can be moved to another pool, and 
> this also won't allow the tx recovery, which releases the locks, to grant 
> lock for waiter of tx1.
> h3. Definition of done
>  * Only one method can be applied to a lock attempt ether lock() or fail(), 
> but not both. Do not forget, a retry attempt may be successful even though 
> the previous attempt failed. Also, there are cases of lock upgrade: S-lock 
> can be taken, but attempt to upgrade it to X-lock can fail, there will be 
> another lock future and it will be completed exceptionally, meanwhile S-lock 
> would be still active;
>  * tx recovery is not executed synchronously within synchronized block of 
> HeapLockManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to