[ 
https://issues.apache.org/jira/browse/IGNITE-17578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-17578:
----------------------------------
    Description: 
h3. Motivation

According to tx commit process design it's required to return the control to 
the outer logic right after COMMITED/ABORTED txn state replication. Follow-up 
cleanup process, that will send replica cleanup requests to all enlisted 
replication groups should be asynchronous.

Currently it's not true:
{code:java}
/**
 * Process transaction finish request:
 * <ol>
 *     <li>Evaluate commit timestamp.</li>
 *     <li>Run specific raft {@code FinishTxCommand} command, that will apply 
txn state to corresponding txStateStorage.</li>
 *     <li>Send cleanup requests to all enlisted primary replicas.</li>
 * </ol>
 * This operation is NOT idempotent, because of commit timestamp evaluation.
 *
 * @param request Transaction finish request.
 * @return future result of the operation.
 */
private CompletableFuture<Object> processTxFinishAction(TxFinishRequest 
request) {
    HybridTimestamp commitTimestamp = hybridClock.now();

    List<String> aggregatedGroupIds = 
request.groups().values().stream().flatMap(List::stream).collect(Collectors.toList());

    UUID txId = request.txId();

    boolean commit = request.commit();

    CompletableFuture<Object> chaneStateFuture = raftClient.run(
            new FinishTxCommand(
                    txId,
                    commit,
                    commitTimestamp,
                    aggregatedGroupIds
            )
    );

    // TODO: https://issues.apache.org/jira/browse/IGNITE-17578
    chaneStateFuture.thenRun(
            () -> request.groups().forEach(
                    (recipientNode, replicationGroupIds) -> txManager.cleanup(
                            recipientNode,
                            replicationGroupIds,
                            txId,
                            commit,
                            commitTimestamp
                    )
            )
    );

    return chaneStateFuture;
}
{code}
Besides aforementioned, it's expected that cleanup process (that is guaranteed 
to be idempotent) should be performed until success.
h3. Definition of Done
 * Sending cleanup request should be implemented in an async format.
 * Cleanup failures, including timeouts should trigger one more cleanup until 
success. There's no failure handler currently, so it's the only option.

h3. Implementation Notes

Seems that, properly shared between replicas, cleanup executor will suite us. 
The executor is needed to have ability to plan the next attempt of cleanup in 
case of failure, so that such attempt would be performed not right after the 
failure but after successful rehashing of replicas when their state allows to 
perform the cleanup attempt with high possibility of success.

 

 

  was:
h3. Motivation

According to tx commit process design it's required to return the control to 
the outer logic right after COMMITED/ABORTED txn state replication. Follow-up 
cleanup process, that will send replica cleanup requests to all enlisted 
replication groups should be asynchronous.

Currently it's not true:
{code:java}
/**
 * Process transaction finish request:
 * <ol>
 *     <li>Evaluate commit timestamp.</li>
 *     <li>Run specific raft {@code FinishTxCommand} command, that will apply 
txn state to corresponding txStateStorage.</li>
 *     <li>Send cleanup requests to all enlisted primary replicas.</li>
 * </ol>
 * This operation is NOT idempotent, because of commit timestamp evaluation.
 *
 * @param request Transaction finish request.
 * @return future result of the operation.
 */
private CompletableFuture<Object> processTxFinishAction(TxFinishRequest 
request) {
    HybridTimestamp commitTimestamp = hybridClock.now();

    List<String> aggregatedGroupIds = 
request.groups().values().stream().flatMap(List::stream).collect(Collectors.toList());

    UUID txId = request.txId();

    boolean commit = request.commit();

    CompletableFuture<Object> chaneStateFuture = raftClient.run(
            new FinishTxCommand(
                    txId,
                    commit,
                    commitTimestamp,
                    aggregatedGroupIds
            )
    );

    // TODO: https://issues.apache.org/jira/browse/IGNITE-17578
    chaneStateFuture.thenRun(
            () -> request.groups().forEach(
                    (recipientNode, replicationGroupIds) -> txManager.cleanup(
                            recipientNode,
                            replicationGroupIds,
                            txId,
                            commit,
                            commitTimestamp
                    )
            )
    );

    return chaneStateFuture;
}
{code}
Besides aforementioned, it's expected that cleanup process (that is guaranteed 
to be idempotent) should be performed until success.
h3. Definition of Done
 * Sending cleanup request should be implemented in an async format.
 * Cleanup failures, including timeouts should trigger one more cleanup until 
success. There's no failure handler currently, so it's the only option.

h3. Implementation Notes

Seems that, properly shared between replicas, cleanup executor will suite us.

 

 


> Transactions: async cleanup processing on tx commit
> ---------------------------------------------------
>
>                 Key: IGNITE-17578
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17578
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3, transaction3_rw
>
> h3. Motivation
> According to tx commit process design it's required to return the control to 
> the outer logic right after COMMITED/ABORTED txn state replication. Follow-up 
> cleanup process, that will send replica cleanup requests to all enlisted 
> replication groups should be asynchronous.
> Currently it's not true:
> {code:java}
> /**
>  * Process transaction finish request:
>  * <ol>
>  *     <li>Evaluate commit timestamp.</li>
>  *     <li>Run specific raft {@code FinishTxCommand} command, that will apply 
> txn state to corresponding txStateStorage.</li>
>  *     <li>Send cleanup requests to all enlisted primary replicas.</li>
>  * </ol>
>  * This operation is NOT idempotent, because of commit timestamp evaluation.
>  *
>  * @param request Transaction finish request.
>  * @return future result of the operation.
>  */
> private CompletableFuture<Object> processTxFinishAction(TxFinishRequest 
> request) {
>     HybridTimestamp commitTimestamp = hybridClock.now();
>     List<String> aggregatedGroupIds = 
> request.groups().values().stream().flatMap(List::stream).collect(Collectors.toList());
>     UUID txId = request.txId();
>     boolean commit = request.commit();
>     CompletableFuture<Object> chaneStateFuture = raftClient.run(
>             new FinishTxCommand(
>                     txId,
>                     commit,
>                     commitTimestamp,
>                     aggregatedGroupIds
>             )
>     );
>     // TODO: https://issues.apache.org/jira/browse/IGNITE-17578
>     chaneStateFuture.thenRun(
>             () -> request.groups().forEach(
>                     (recipientNode, replicationGroupIds) -> txManager.cleanup(
>                             recipientNode,
>                             replicationGroupIds,
>                             txId,
>                             commit,
>                             commitTimestamp
>                     )
>             )
>     );
>     return chaneStateFuture;
> }
> {code}
> Besides aforementioned, it's expected that cleanup process (that is 
> guaranteed to be idempotent) should be performed until success.
> h3. Definition of Done
>  * Sending cleanup request should be implemented in an async format.
>  * Cleanup failures, including timeouts should trigger one more cleanup until 
> success. There's no failure handler currently, so it's the only option.
> h3. Implementation Notes
> Seems that, properly shared between replicas, cleanup executor will suite us. 
> The executor is needed to have ability to plan the next attempt of cleanup in 
> case of failure, so that such attempt would be performed not right after the 
> failure but after successful rehashing of replicas when their state allows to 
> perform the cleanup attempt with high possibility of success.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to