[
https://issues.apache.org/jira/browse/IGNITE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Chudov updated IGNITE-22980:
----------------------------------
Description:
h3. Motivation
The behavior was hardly predicted or planned. But currently, we can acquire a
lock:
{code:java}
private void lock() {
lockMode = intendedLockMode;
intendedLockMode = null;
intendedLocks.clear();
}
{code}
and made the waiter fail:
{code:java}
private void fail(LockException e) {
ex = e;
}
{code}
without limitation (assertion checking or explicitly prohibition).
Scenario:
* tx1 tries to acquire a lock and finds conflicting transaction tx2;
* lock manager tries to check the state and coordinator of tx2;
* coordinator of tx2 has left, so TxRecoveryMessage is sent;
* the primary replica of commit partition of tx2 is on the same node, so
TxRecoveryMessage is sent locally. It also triggers the tx recovery, so tx2 is
finished and tx cleanup is performed locally. All of this happens in the same
thread, and during txn cleanup the locks of tx2 are released;
* the release of locks of tx2 allows the conflicting waiter of tx1 to acquire
a lock;
* the processing of conflicting transaction continues and #fail is called on
the same waiter.
There is also another problem: tx recovery shouldn't happen within synchronized
block of HeapLockManager. It can be moved to another pool, and this also won't
allow the tx recovery, which releases the locks, to grant lock for waiter of
tx1.
h3. Definition of done
* Only one method can be applied to a lock attempt ether lock() or fail(), but
not both. Do not forget, a retry attempt may be successful even though the
previous attempt failed. Also, there are cases of lock upgrade: S-lock can be
taken, but attempt to upgrade it to X-lock can fail, there will be another lock
future and it will be completed exceptionally, meanwhile S-lock would be still
active;
* tx recovery is not executed synchronously within synchronized block of
HeapLockManager.
was:
h3. Motivation
The behavior was hardly predicted or planned. But currently, we can acquire a
lock:
{code:java}
private void lock() {
lockMode = intendedLockMode;
intendedLockMode = null;
intendedLocks.clear();
}
{code}
and made the waiter fail:
{code:java}
private void fail(LockException e) {
ex = e;
}
{code}
without limitation (assertion checking or explicitly prohibition).
Scenario:
* tx1 tries to acquire a lock and finds conflicting transaction tx2;
* lock manager tries to check the state and coordinator of tx2;
* coordinator of tx2 has left, so TxRecoveryMessage is sent;
* the primary replica of commit partition of tx2 is on the same node, so
TxRecoveryMessage is sent locally. It also triggers the tx recovery, so tx2 is
finished and tx cleanup is performed locally. All of this happens in the same
thread, and during txn cleanup the locks of tx2 are released;
* the release of locks of tx2 allows the conflicting waiter of tx1 to acquire
a lock;
* the processing of conflicting transaction continues and #fail is called on
the same waiter.
There is also another problem: tx recovery shouldn't happen within synchronized
block of HeapLockManager. It can be moved to another pool, and this also won't
allow the tx recovery, which releases the locks, to grant lock for waiter of
tx1.
h3. Definition of done
* Only one method can be applied to a lock attempt ether lock() or fail(), but
not both. Do not forget, a retry attempt may be successful even though the
previous attempt failed.
* tx recovery is not executed synchronously within synchronized block of
HeapLockManager.
> Lock manager may fail and lock waiter simultaneously
> ----------------------------------------------------
>
> Key: IGNITE-22980
> URL: https://issues.apache.org/jira/browse/IGNITE-22980
> Project: Ignite
> Issue Type: Bug
> Reporter: Vladislav Pyatkov
> Priority: Major
> Labels: ignite-3
>
> h3. Motivation
> The behavior was hardly predicted or planned. But currently, we can acquire a
> lock:
> {code:java}
> private void lock() {
> lockMode = intendedLockMode;
> intendedLockMode = null;
> intendedLocks.clear();
> }
> {code}
> and made the waiter fail:
> {code:java}
> private void fail(LockException e) {
> ex = e;
> }
> {code}
> without limitation (assertion checking or explicitly prohibition).
> Scenario:
> * tx1 tries to acquire a lock and finds conflicting transaction tx2;
> * lock manager tries to check the state and coordinator of tx2;
> * coordinator of tx2 has left, so TxRecoveryMessage is sent;
> * the primary replica of commit partition of tx2 is on the same node, so
> TxRecoveryMessage is sent locally. It also triggers the tx recovery, so tx2
> is finished and tx cleanup is performed locally. All of this happens in the
> same thread, and during txn cleanup the locks of tx2 are released;
> * the release of locks of tx2 allows the conflicting waiter of tx1 to
> acquire a lock;
> * the processing of conflicting transaction continues and #fail is called on
> the same waiter.
> There is also another problem: tx recovery shouldn't happen within
> synchronized block of HeapLockManager. It can be moved to another pool, and
> this also won't allow the tx recovery, which releases the locks, to grant
> lock for waiter of tx1.
> h3. Definition of done
> * Only one method can be applied to a lock attempt ether lock() or fail(),
> but not both. Do not forget, a retry attempt may be successful even though
> the previous attempt failed. Also, there are cases of lock upgrade: S-lock
> can be taken, but attempt to upgrade it to X-lock can fail, there will be
> another lock future and it will be completed exceptionally, meanwhile S-lock
> would be still active;
> * tx recovery is not executed synchronously within synchronized block of
> HeapLockManager.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)