[jira] [Updated] (IGNITE-23728) Guard disaster recovery metastore invokes by trigger revision check

Kirill Gusakov (Jira) Tue, 17 Dec 2024 04:11:03 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-23728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kirill Gusakov updated IGNITE-23728:
------------------------------------
    Description: 
At the moment, in DisasterRecoveryManager#resetPartitions we have the 
unconditional put to the metastore:

{code:java}
private CompletableFuture<Void> processNewRequest(DisasterRecoveryRequest 
request) {
        UUID operationId = request.operationId();

        CompletableFuture<Void> operationFuture = new CompletableFuture<Void>()
                .whenComplete((v, throwable) -> 
ongoingOperationsById.remove(operationId))
                .orTimeout(TIMEOUT_SECONDS, TimeUnit.SECONDS);

        ongoingOperationsById.put(operationId, operationFuture);

        metaStorageManager.put(RECOVERY_TRIGGER_KEY, 
VersionedSerialization.toBytes(request, 
DisasterRecoveryRequestSerializer.INSTANCE));

        return operationFuture;
    }
{code}

Instead, for the automatic calls from HA we need to protect this call by 
revision of the trigger event like:

{code:java}
metaStorageManager.invoke(
                
Conditions.value(RECOVERY_TRIGGER_REVISION).lt(longToBytesKeepingOrder(triggerRevision)),
                List.of(
                        Operations.put(
                                RECOVERY_TRIGGER_KEY,
                                VersionedSerialization.toBytes(request, 
DisasterRecoveryRequestSerializer.INSTANCE)),
                        Operations.put(RECOVERY_TRIGGER_REVISION, 
triggerRevision)
                        ),
                List.of(Operations.noop())
        );
{code}

The manual calls of reset, at the same time, should be just simple put as 
earlier.

*UPDATE*
The initial implementation approach is not correct in details:
- We can't use one key for all zones, because the only one zone "wins" in this 
case. Instead we need the different keys per zone
- The update request must support the list of target partitions for reset, 
instead of the tableId. Because now we have only one update request per zone on 
any topology update.  


  was:
At the moment, in DisasterRecoveryManager#resetPartitions we have the 
unconditional put to the metastore:

{code:java}
private CompletableFuture<Void> processNewRequest(DisasterRecoveryRequest 
request) {
        UUID operationId = request.operationId();

        CompletableFuture<Void> operationFuture = new CompletableFuture<Void>()
                .whenComplete((v, throwable) -> 
ongoingOperationsById.remove(operationId))
                .orTimeout(TIMEOUT_SECONDS, TimeUnit.SECONDS);

        ongoingOperationsById.put(operationId, operationFuture);

        metaStorageManager.put(RECOVERY_TRIGGER_KEY, 
VersionedSerialization.toBytes(request, 
DisasterRecoveryRequestSerializer.INSTANCE));

        return operationFuture;
    }
{code}

Instead, for the automatic calls from HA we need to protect this call by 
revision of the trigger event like:

{code:java}
metaStorageManager.invoke(
                
Conditions.value(RECOVERY_TRIGGER_REVISION).lt(longToBytesKeepingOrder(triggerRevision)),
                List.of(
                        Operations.put(
                                RECOVERY_TRIGGER_KEY,
                                VersionedSerialization.toBytes(request, 
DisasterRecoveryRequestSerializer.INSTANCE)),
                        Operations.put(RECOVERY_TRIGGER_REVISION, 
triggerRevision)
                        ),
                List.of(Operations.noop())
        );
{code}

The manual calls of reset, at the same time, should be just simple put as 
earlier.



> Guard disaster recovery metastore invokes by trigger revision check
> -------------------------------------------------------------------
>
>                 Key: IGNITE-23728
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23728
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Kirill Gusakov
>            Assignee: Kirill Gusakov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> At the moment, in DisasterRecoveryManager#resetPartitions we have the 
> unconditional put to the metastore:
> {code:java}
> private CompletableFuture<Void> processNewRequest(DisasterRecoveryRequest 
> request) {
>         UUID operationId = request.operationId();
>         CompletableFuture<Void> operationFuture = new 
> CompletableFuture<Void>()
>                 .whenComplete((v, throwable) -> 
> ongoingOperationsById.remove(operationId))
>                 .orTimeout(TIMEOUT_SECONDS, TimeUnit.SECONDS);
>         ongoingOperationsById.put(operationId, operationFuture);
>         metaStorageManager.put(RECOVERY_TRIGGER_KEY, 
> VersionedSerialization.toBytes(request, 
> DisasterRecoveryRequestSerializer.INSTANCE));
>         return operationFuture;
>     }
> {code}
> Instead, for the automatic calls from HA we need to protect this call by 
> revision of the trigger event like:
> {code:java}
> metaStorageManager.invoke(
>                 
> Conditions.value(RECOVERY_TRIGGER_REVISION).lt(longToBytesKeepingOrder(triggerRevision)),
>                 List.of(
>                         Operations.put(
>                                 RECOVERY_TRIGGER_KEY,
>                                 VersionedSerialization.toBytes(request, 
> DisasterRecoveryRequestSerializer.INSTANCE)),
>                         Operations.put(RECOVERY_TRIGGER_REVISION, 
> triggerRevision)
>                         ),
>                 List.of(Operations.noop())
>         );
> {code}
> The manual calls of reset, at the same time, should be just simple put as 
> earlier.
> *UPDATE*
> The initial implementation approach is not correct in details:
> - We can't use one key for all zones, because the only one zone "wins" in 
> this case. Instead we need the different keys per zone
> - The update request must support the list of target partitions for reset, 
> instead of the tableId. Because now we have only one update request per zone 
> on any topology update.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-23728) Guard disaster recovery metastore invokes by trigger revision check

Reply via email to