[ 
https://issues.apache.org/jira/browse/IGNITE-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Korotkov updated IGNITE-17457:
-------------------------------------
    Description: 
Ignite cluster may be locked (all client operations would block) after the tx 
recovery procedure executed on the tx primary node failure.

The prepared transaction may remain un-commited on the backup node after the tx 
recovery.  So the partition exchange wouldn't complete. So cluster would be 
locked.
----
The Immediate reason is the race condition in the method:
{code:java}
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code}
If 2 or more backups are configured It may be called concurrently for the same 
transaction both from the recovery procedure:
{code:java}
IgniteTxManager::commitIfPrepared{code}
and from the tx recovery request handler:
{code:java}
IgniteTxHandler::processCheckPreparedTxRequest{code}
Problem occur if thread context is switched between old finalization status 
request and status update.
----
The problematic sequence of events is as follows (the lock will be in the 
node1):

1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups.
2. On node2 start and prepare pessimistic transaction choosing key with primary 
partition stored on node2.
3. Kill node2
4. The tx recovery procedure is started both on node0 and node1
5. In scope of the recovery procedure node0 sends tx recovery request to node1
6. The following steps are executed on the node1 in two threads ("procedure" 
which is a system pool thread executing the tx recovery procedure and "handler" 
which is a striped pool thread processing the tx recovery request sent from 
node0):
 - tx.finalization == NONE
 - "procedure": calls markFinalizing(RECOVERY_FINISH)
 - "handler": calls markFinalizing(RECOVERY_FINISH)
 - "procedure": gets old tx.finlalization - it's NONE
 - "handler": gets old tx.finalization - it's NONE
 - "handler": updates tx.finalization - now it's RECOVERY_FINISH
 - "procedure": trys to update tx.finalization via compareAndSet and fails 
since compare fails.
 - "procedure": stops transaction processing and does not try to commit it.
 - Transaction remains not finished on node1.

----
Reproducer is in the pull request.

  was:
Ignite cluster may be locked (all client operations would block) after the tx 
recovery procedure executed on the tx primary node failure.

The prepared transaction may remain un-commited on the backup node after the tx 
recovery.  So the partition exchange wouldn't complete. So cluster would be 
locked.
----
The Immediate reason is the race condition in the method:
{code:java}
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code}
If 2 or more backups are configured It may be called concurrently for the same 
transaction both from the recovery procedure:
{code:java}
IgniteTxManager::commitIfPrepared{code}
and from the tx recovery request handler:
{code:java}
IgniteTxHandler::processCheckPreparedTxRequest{code}
Problem occur if thread context is switched between old finalization status 
request and status update.
----
The problematic sequence of events is as follows (the lock will be in the 
node1):

1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups.
2. On node2 start and prepare pessimistic transaction choosing key with primary 
partition stored on node2.
3. Kill node2
4. The tx recovery procedure is started both on node0 and node1
5. In scope of the recovery procedure node0 sends tx recovery request to node1
6. The following steps are executed on the node1 in two threads ("procedure" 
which is a system pool thread executing the tx recovery procedure and "handler" 
which is a striped thread processing the tx recovery request sent from node0):
 - tx.finalization == NONE
 - "procedure": calls markFinalizing(RECOVERY_FINISH)
 - "handler": calls markFinalizing(RECOVERY_FINISH)
 - "procedure": gets old tx.finlalization - it's NONE
 - "handler": gets old tx.finalization - it's NONE
 - "handler": updates tx.finalization - now it's RECOVERY_FINISH
 - "procedure": trys to update tx.finalization via compareAndSet and fails 
since compare fails.
 - "procedure": stops transaction processing and does not try to commit it.
 - Transaction remains not finished on node1.

----
Reproducer is in the pull request.


> Cluster locks after the transaction recovery procedure if the tx primary node 
> fail
> ----------------------------------------------------------------------------------
>
>                 Key: IGNITE-17457
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17457
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Sergey Korotkov
>            Assignee: Sergey Korotkov
>            Priority: Major
>
> Ignite cluster may be locked (all client operations would block) after the tx 
> recovery procedure executed on the tx primary node failure.
> The prepared transaction may remain un-commited on the backup node after the 
> tx recovery.  So the partition exchange wouldn't complete. So cluster would 
> be locked.
> ----
> The Immediate reason is the race condition in the method:
> {code:java}
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code}
> If 2 or more backups are configured It may be called concurrently for the 
> same transaction both from the recovery procedure:
> {code:java}
> IgniteTxManager::commitIfPrepared{code}
> and from the tx recovery request handler:
> {code:java}
> IgniteTxHandler::processCheckPreparedTxRequest{code}
> Problem occur if thread context is switched between old finalization status 
> request and status update.
> ----
> The problematic sequence of events is as follows (the lock will be in the 
> node1):
> 1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups.
> 2. On node2 start and prepare pessimistic transaction choosing key with 
> primary partition stored on node2.
> 3. Kill node2
> 4. The tx recovery procedure is started both on node0 and node1
> 5. In scope of the recovery procedure node0 sends tx recovery request to node1
> 6. The following steps are executed on the node1 in two threads ("procedure" 
> which is a system pool thread executing the tx recovery procedure and 
> "handler" which is a striped pool thread processing the tx recovery request 
> sent from node0):
>  - tx.finalization == NONE
>  - "procedure": calls markFinalizing(RECOVERY_FINISH)
>  - "handler": calls markFinalizing(RECOVERY_FINISH)
>  - "procedure": gets old tx.finlalization - it's NONE
>  - "handler": gets old tx.finalization - it's NONE
>  - "handler": updates tx.finalization - now it's RECOVERY_FINISH
>  - "procedure": trys to update tx.finalization via compareAndSet and fails 
> since compare fails.
>  - "procedure": stops transaction processing and does not try to commit it.
>  - Transaction remains not finished on node1.
> ----
> Reproducer is in the pull request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to