[
https://issues.apache.org/jira/browse/IGNITE-23728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirill Gusakov updated IGNITE-23728:
------------------------------------
Description:
At the moment, in DisasterRecoveryManager#resetPartitions we have the
unconditional put to the metastore:
{code:java}
private CompletableFuture<Void> processNewRequest(DisasterRecoveryRequest
request) {
UUID operationId = request.operationId();
CompletableFuture<Void> operationFuture = new CompletableFuture<Void>()
.whenComplete((v, throwable) ->
ongoingOperationsById.remove(operationId))
.orTimeout(TIMEOUT_SECONDS, TimeUnit.SECONDS);
ongoingOperationsById.put(operationId, operationFuture);
metaStorageManager.put(RECOVERY_TRIGGER_KEY,
VersionedSerialization.toBytes(request,
DisasterRecoveryRequestSerializer.INSTANCE));
return operationFuture;
}
{code}
Instead, for the automatic calls from HA we need to protect this call by
revision of the trigger event like:
{code:java}
metaStorageManager.invoke(
Conditions.value(RECOVERY_TRIGGER_REVISION).lt(longToBytesKeepingOrder(triggerRevision)),
List.of(
Operations.put(
RECOVERY_TRIGGER_KEY,
VersionedSerialization.toBytes(request,
DisasterRecoveryRequestSerializer.INSTANCE)),
Operations.put(RECOVERY_TRIGGER_REVISION,
triggerRevision)
),
List.of(Operations.noop())
);
{code}
The manual calls of reset, at the same time, should be just simple put as
earlier.
*UPDATE*
The initial implementation approach is not correct in details:
- We can't use one key for all zones, because the only one zone "wins" in this
case. Instead we need the different keys per zone
- The update request must support the list of target partitions for reset,
instead of the tableId. Because now we have only one update request per zone on
any topology update.
- Manual reset should reuse the new request in the transparent manner
{code:java}
// old
GroupUpdateRequest(UUID operationId, int catalogVersion, int zoneId, int
tableId, Set<Integer> partitionIds, boolean manualUpdate) {
// new
GroupUpdateRequest(UUID operationId, int catalogVersion, int zoneId,
Map<Integer, Set<Integer>> partitionIds, boolean manualUpdate)
{code}
was:
At the moment, in DisasterRecoveryManager#resetPartitions we have the
unconditional put to the metastore:
{code:java}
private CompletableFuture<Void> processNewRequest(DisasterRecoveryRequest
request) {
UUID operationId = request.operationId();
CompletableFuture<Void> operationFuture = new CompletableFuture<Void>()
.whenComplete((v, throwable) ->
ongoingOperationsById.remove(operationId))
.orTimeout(TIMEOUT_SECONDS, TimeUnit.SECONDS);
ongoingOperationsById.put(operationId, operationFuture);
metaStorageManager.put(RECOVERY_TRIGGER_KEY,
VersionedSerialization.toBytes(request,
DisasterRecoveryRequestSerializer.INSTANCE));
return operationFuture;
}
{code}
Instead, for the automatic calls from HA we need to protect this call by
revision of the trigger event like:
{code:java}
metaStorageManager.invoke(
Conditions.value(RECOVERY_TRIGGER_REVISION).lt(longToBytesKeepingOrder(triggerRevision)),
List.of(
Operations.put(
RECOVERY_TRIGGER_KEY,
VersionedSerialization.toBytes(request,
DisasterRecoveryRequestSerializer.INSTANCE)),
Operations.put(RECOVERY_TRIGGER_REVISION,
triggerRevision)
),
List.of(Operations.noop())
);
{code}
The manual calls of reset, at the same time, should be just simple put as
earlier.
*UPDATE*
The initial implementation approach is not correct in details:
- We can't use one key for all zones, because the only one zone "wins" in this
case. Instead we need the different keys per zone
- The update request must support the list of target partitions for reset,
instead of the tableId. Because now we have only one update request per zone on
any topology update.
{code:java}
// old
GroupUpdateRequest(UUID operationId, int catalogVersion, int zoneId, int
tableId, Set<Integer> partitionIds, boolean manualUpdate) {
// new
GroupUpdateRequest(UUID operationId, int catalogVersion, int zoneId,
Map<Integer, Set<Integer>> partitionIds, boolean manualUpdate)
{code}
> Guard disaster recovery metastore invokes by trigger revision check
> -------------------------------------------------------------------
>
> Key: IGNITE-23728
> URL: https://issues.apache.org/jira/browse/IGNITE-23728
> Project: Ignite
> Issue Type: Improvement
> Reporter: Kirill Gusakov
> Assignee: Kirill Gusakov
> Priority: Major
> Labels: ignite-3
> Time Spent: 40m
> Remaining Estimate: 0h
>
> At the moment, in DisasterRecoveryManager#resetPartitions we have the
> unconditional put to the metastore:
> {code:java}
> private CompletableFuture<Void> processNewRequest(DisasterRecoveryRequest
> request) {
> UUID operationId = request.operationId();
> CompletableFuture<Void> operationFuture = new
> CompletableFuture<Void>()
> .whenComplete((v, throwable) ->
> ongoingOperationsById.remove(operationId))
> .orTimeout(TIMEOUT_SECONDS, TimeUnit.SECONDS);
> ongoingOperationsById.put(operationId, operationFuture);
> metaStorageManager.put(RECOVERY_TRIGGER_KEY,
> VersionedSerialization.toBytes(request,
> DisasterRecoveryRequestSerializer.INSTANCE));
> return operationFuture;
> }
> {code}
> Instead, for the automatic calls from HA we need to protect this call by
> revision of the trigger event like:
> {code:java}
> metaStorageManager.invoke(
>
> Conditions.value(RECOVERY_TRIGGER_REVISION).lt(longToBytesKeepingOrder(triggerRevision)),
> List.of(
> Operations.put(
> RECOVERY_TRIGGER_KEY,
> VersionedSerialization.toBytes(request,
> DisasterRecoveryRequestSerializer.INSTANCE)),
> Operations.put(RECOVERY_TRIGGER_REVISION,
> triggerRevision)
> ),
> List.of(Operations.noop())
> );
> {code}
> The manual calls of reset, at the same time, should be just simple put as
> earlier.
> *UPDATE*
> The initial implementation approach is not correct in details:
> - We can't use one key for all zones, because the only one zone "wins" in
> this case. Instead we need the different keys per zone
> - The update request must support the list of target partitions for reset,
> instead of the tableId. Because now we have only one update request per zone
> on any topology update.
> - Manual reset should reuse the new request in the transparent manner
> {code:java}
> // old
> GroupUpdateRequest(UUID operationId, int catalogVersion, int zoneId, int
> tableId, Set<Integer> partitionIds, boolean manualUpdate) {
> // new
> GroupUpdateRequest(UUID operationId, int catalogVersion, int zoneId,
> Map<Integer, Set<Integer>> partitionIds, boolean manualUpdate)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)