[
https://issues.apache.org/jira/browse/IGNITE-23599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vyacheslav Koptilin updated IGNITE-23599:
-----------------------------------------
Ignite Flags: (was: Docs Required,Release Notes Required)
> Implement reaction to partitionDistributionReset timer expiration
> -----------------------------------------------------------------
>
> Key: IGNITE-23599
> URL: https://issues.apache.org/jira/browse/IGNITE-23599
> Project: Ignite
> Issue Type: Improvement
> Reporter: Kirill Gusakov
> Assignee: Kirill Gusakov
> Priority: Major
> Labels: ignite-3
> Time Spent: 1h
> Remaining Estimate: 0h
>
> *Motivation*
> This ticket is a final brush storke for the first phase of IGNITE-23438. We
> need to implement the reset logic invocation when the
> partitionDistributionReset go off.
> *Definition of Done*
> - Disaster recovery reset mechanism invoked for the HA zones when
> partitionDistributionReset timer exhausted
> - No redundant reset requests burden the metastore
> - Failure policy for unsuccessfull resets implemented
> *Implementation details*
> The following dirty pseudocode can be used as a starting point:
> {code:java}
> int catalogVersion = 0; // can be received easily
> long timestamp = 0; // calculated from revision
> CatalogZoneDescriptor zoneDescriptor = catalogManager.zone(zoneId, timestamp);
> List<CatalogTableDescriptor> tables = findTablesByZoneId(zoneId,
> catalogVersion, catalogManager);
> List<CompletableFuture<Void>> tablesResetFuts = new ArrayList<>();
> for (CatalogTableDescriptor table : tables) {
> Set<Integer> partitionsToReset = new HashSet<>();
> for (int partId = 0; partId < zoneDescriptor.partitions(); partId++) {
> TablePartitionId partitionId = new TablePartitionId(table.id(),
> partId);
> Assignments stableAssignments = Assignments.fromBytes(
>
> metaStorageManager.getLocally(stablePartAssignmentsKey(partitionId),
> revision).value());
> Function<Long, Set<NodeAttributes>> getLogicalTopology = (r) ->
> Collections.emptySet();
> Set<NodeAttributes> logicalTopology =
> getLogicalTopology.apply(revision);
> // convert logical topology to assignments and
> stableAssignments.nodes().retainAll(logicalTopology);
> if (stableAssignments.nodes().size() < zoneDescriptor.replicas() / 2
> + 1) {
> partitionsToReset.add(partId);
> }
> }
> tablesResetFuts.add(DisasterRecoveryManager.resetPartition(
> zoneDescriptor.name(), table.name(), partitionsToReset, false));
> }
> allOf(tablesResetFuts.toArray(new CompletableFuture[]{})).join();
> {code}
> With the following additions:
> - Handle the resetPartition failures
> - Add the appropriate guard from the stale node updates. Like we already have
> for another timer by the metastore trigger revision
--
This message was sent by Atlassian Jira
(v8.20.10#820010)