[
https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mirza Aliev updated IGNITE-20603:
---------------------------------
Epic Link: IGNITE-20611 (was: IGNITE-20166)
> Restore topologyAugmentationMap on a node restart
> -------------------------------------------------
>
> Key: IGNITE-20603
> URL: https://issues.apache.org/jira/browse/IGNITE-20603
> Project: Ignite
> Issue Type: Bug
> Reporter: Mirza Aliev
> Priority: Major
> Labels: ignite-3
>
> h3. *Motivation*
> It is possible that some events were propagated to {{ms.logicalTopology}},
> but restart happened when we were updating topologyAugmentationMap in
> {{DistributionZoneManager#createMetastorageTopologyListener}}. That means
> that augmentation that must be added to {{zone.topologyAugmentationMap}}
> wasn't added and we need to recover this information.
> h3. *Definition of done*
> On a node restart, topologyAugmentationMap must be correctly restored
> according to {{ms.logicalTopology}} state.
> h3. *Implementation notes*
> For every zone, compare {{MS.local.logicalTopology.revision}} with
> max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is
> greater than max(maxScUpFromMap, maxScDownFromMap), that means that some
> topology changes haven't been propagated to topologyAugmentationMap before
> restart and appropriate timers haven't been scheduled. To fill the gap in
> topologyAugmentationMap, compare {{MS.local.logicalTopology}} with
> {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the
> nodes that did not have time to be propagated to topologyAugmentationMap
> before restart. {{lastSeenTopology}} is calculated in the following way: we
> read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey,
> scaleDownTriggerKey) and retrieve all additions and removals of nodes from
> the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey)
> as the left bound. After that apply these changes to the map with nodes
> counters from {{MS.local.dataNodes}} and take nodes only with the positive
> counters. This is the lastSeenTopology. Comparing it with
> {{MS.local.logicalTopology}} will tell us which nodes were not added or
> removed and weren't propagated to topologyAugmentationMap before restart. We
> take these differences and add them to the topologyAugmentationMap. As a
> revision (key for topologyAugmentationMap) take
> {{MS.local.logicalTopology.revision}}. It is safe to take this revision,
> because if some node was added to the {{ms.topology}} after immediate data
> nodes recalculation, this added node must restore this immediate data nodes'
> recalculation intent.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)