[ 
https://issues.apache.org/jira/browse/IGNITE-20560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

 Kirill Sizov reassigned IGNITE-20560:
--------------------------------------

    Assignee:  Kirill Sizov

> It's possible to execute commands on a finished transaction under certain 
> circumstances
> ---------------------------------------------------------------------------------------
>
>                 Key: IGNITE-20560
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20560
>             Project: Ignite
>          Issue Type: Task
>            Reporter:  Kirill Sizov
>            Assignee:  Kirill Sizov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a cleanup operation crashes, it does not affect the transaction it was for 
> called since the transaction has been finished already.
> However under certain circumstances we may *get the validation that prevents 
> commands from being executed on a finished transaction broken.*
> The issue is that we have 
> {{PartitionReplicaListener.TxCleanupReadyFutureList.state}} that duplicates 
> local txState, and is updated in the cleanup command handler.
> *Details*
> {{PartitionReplicaListener.TxCleanupReadyFutureList.state}} is
>  * +updated+ in {{PartitionReplicaListener.processTxCleanupAction}} and
>  * +read+ in {{{}PartitionReplicaListener.appendTxCommand{}}}.
> If the update has not been called because of a crash, the code in 
> {{{}appendTxCommand{}}}:
> {code:java}
>    txCleanupReadyFutures.compute(txId, (id, txOps) -> {
>                 if (txOps == null) {
>                     txOps = new TxCleanupReadyFutureList();
>                 }
>                 if (isFinalState(txOps.state)) {
>                     fut.completeExceptionally(
>                             new 
> TransactionException(TX_FAILED_READ_WRITE_OPERATION_ERR, "Transaction is 
> already finished."));
>                 } else {
>                     txOps.futures.computeIfAbsent(cmdType, type -> new 
> ArrayList<>()).add(fut);
>                 }
>                 return txOps;
>             });{code}
> will still read {{txOps.state}} as {{PENDING}} and will allow to execute this 
> command instead of throwing a {{{}TransactionException{}}}.
> *Motivation*
>  The identified issue is that we save a transaction state in two places in a 
> primary replica:
> * in txCleanupReadyFutures with cleanup futures,
> * in TxManager#txStateMap where we cached it in the tx state local map.
> *Definition of done*
> # We should remove the transaction state from TxCleanupReadyFutureList and 
> use the state that is stored in TxManager and available through 
> TxManagerImpl#stateMeta(txId)
> # To resolve tests where we are suppressing TxCleanupReplicaReques 
> (ItTxDistributedTestSingleNodeNoCleanupMessage#testTransactionAlreadyRolledback,
>  #testTransactionAlreadyCommitted) we have to check the transaction state map 
> on the transaction coordinator before trying to enlist a new operation in the 
> transaction. Despite the fact that the tests are not completely honest, we 
> will avoid executing an operation after a commit timestamp is already chosen.
> # The tests from point 2 should be rewritten because we do not assume to just 
> drop TxCleanupReplicaRequest (the handling of TxCleanupReplicaRequest is 
> customized so as to skip it) and complete the commit invocation.
> *_Please note there are tests muted with this task._*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to