[
https://issues.apache.org/jira/browse/IGNITE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vyacheslav Koptilin reassigned IGNITE-22980:
--------------------------------------------
Assignee: Vladislav Pyatkov
> Lock manager may fail and lock waiter simultaneously
> ----------------------------------------------------
>
> Key: IGNITE-22980
> URL: https://issues.apache.org/jira/browse/IGNITE-22980
> Project: Ignite
> Issue Type: Bug
> Reporter: Vladislav Pyatkov
> Assignee: Vladislav Pyatkov
> Priority: Major
> Labels: ignite-3
>
> h3. Motivation
> The behavior was hardly predicted or planned. But currently, we can acquire a
> lock:
> {code:java}
> private void lock() {
> lockMode = intendedLockMode;
> intendedLockMode = null;
> intendedLocks.clear();
> }
> {code}
> and made the waiter fail:
> {code:java}
> private void fail(LockException e) {
> ex = e;
> }
> {code}
> without limitation (assertion checking or explicitly prohibition).
> Scenario:
> * tx1 tries to acquire a lock and finds conflicting transaction tx2;
> * lock manager tries to check the state and coordinator of tx2;
> * coordinator of tx2 has left, so TxRecoveryMessage is sent;
> * the primary replica of commit partition of tx2 is on the same node, so
> TxRecoveryMessage is sent locally. It also triggers the tx recovery, so tx2
> is finished and tx cleanup is performed locally. All of this happens in the
> same thread, and during txn cleanup the locks of tx2 are released;
> * the release of locks of tx2 allows the conflicting waiter of tx1 to
> acquire a lock;
> * the processing of conflicting transaction continues and #fail is called on
> the same waiter.
> There is also another problem: tx recovery shouldn't happen within
> synchronized block of HeapLockManager. It can be moved to another pool, and
> this also won't allow the tx recovery, which releases the locks, to grant
> lock for waiter of tx1.
> h3. Definition of done
> * Only one method can be applied to a lock attempt ether lock() or fail(),
> but not both. Do not forget, a retry attempt may be successful even though
> the previous attempt failed. Also, there are cases of lock upgrade: S-lock
> can be taken, but attempt to upgrade it to X-lock can fail, there will be
> another lock future and it will be completed exceptionally, meanwhile S-lock
> would be still active;
> * tx recovery is not executed synchronously within synchronized block of
> HeapLockManager.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)