[ 
https://issues.apache.org/jira/browse/IGNITE-23599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-23599:
-----------------------------------------
    Ignite Flags:   (was: Docs Required,Release Notes Required)

> Implement reaction to partitionDistributionReset timer expiration
> -----------------------------------------------------------------
>
>                 Key: IGNITE-23599
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23599
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Kirill Gusakov
>            Assignee: Kirill Gusakov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Motivation*
> This ticket is a final brush storke for the first phase of IGNITE-23438. We 
> need to implement the reset logic invocation when the 
> partitionDistributionReset go off.  
> *Definition of Done*
> - Disaster recovery reset mechanism invoked for the HA zones when 
> partitionDistributionReset timer exhausted
> - No redundant reset requests burden the metastore
> - Failure policy for unsuccessfull resets implemented 
> *Implementation details*
> The following dirty pseudocode can be used as a starting point:
> {code:java}
> int catalogVersion = 0; // can be received easily
> long timestamp = 0; // calculated from revision
> CatalogZoneDescriptor zoneDescriptor = catalogManager.zone(zoneId, timestamp);
> List<CatalogTableDescriptor> tables = findTablesByZoneId(zoneId, 
> catalogVersion, catalogManager);
> List<CompletableFuture<Void>> tablesResetFuts = new ArrayList<>();
> for (CatalogTableDescriptor table : tables) {
>     Set<Integer> partitionsToReset = new HashSet<>();
>     for (int partId = 0; partId < zoneDescriptor.partitions(); partId++) {
>         TablePartitionId partitionId = new TablePartitionId(table.id(), 
> partId);
>         Assignments stableAssignments = Assignments.fromBytes(
>                 
> metaStorageManager.getLocally(stablePartAssignmentsKey(partitionId), 
> revision).value());
>         Function<Long, Set<NodeAttributes>> getLogicalTopology = (r) -> 
> Collections.emptySet();
>         Set<NodeAttributes> logicalTopology = 
> getLogicalTopology.apply(revision);
>         // convert logical topology to assignments and
>         stableAssignments.nodes().retainAll(logicalTopology);
>         if (stableAssignments.nodes().size() < zoneDescriptor.replicas() / 2 
> + 1) {
>             partitionsToReset.add(partId);
>         }
>     }
>     tablesResetFuts.add(DisasterRecoveryManager.resetPartition(
>             zoneDescriptor.name(), table.name(), partitionsToReset, false));
> }
> allOf(tablesResetFuts.toArray(new CompletableFuture[]{})).join();
> {code}
> With the following additions:
> - Handle the resetPartition failures
> - Add the appropriate guard from the stale node updates. Like we already have 
> for another timer by the metastore trigger revision



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to