[ https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mirza Aliev updated IGNITE-20603: --------------------------------- Epic Link: IGNITE-20611 (was: IGNITE-20166) > Restore topologyAugmentationMap on a node restart > ------------------------------------------------- > > Key: IGNITE-20603 > URL: https://issues.apache.org/jira/browse/IGNITE-20603 > Project: Ignite > Issue Type: Bug > Reporter: Mirza Aliev > Priority: Major > Labels: ignite-3 > > h3. *Motivation* > It is possible that some events were propagated to {{ms.logicalTopology}}, > but restart happened when we were updating topologyAugmentationMap in > {{DistributionZoneManager#createMetastorageTopologyListener}}. That means > that augmentation that must be added to {{zone.topologyAugmentationMap}} > wasn't added and we need to recover this information. > h3. *Definition of done* > On a node restart, topologyAugmentationMap must be correctly restored > according to {{ms.logicalTopology}} state. > h3. *Implementation notes* > For every zone, compare {{MS.local.logicalTopology.revision}} with > max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is > greater than max(maxScUpFromMap, maxScDownFromMap), that means that some > topology changes haven't been propagated to topologyAugmentationMap before > restart and appropriate timers haven't been scheduled. To fill the gap in > topologyAugmentationMap, compare {{MS.local.logicalTopology}} with > {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the > nodes that did not have time to be propagated to topologyAugmentationMap > before restart. {{lastSeenTopology}} is calculated in the following way: we > read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, > scaleDownTriggerKey) and retrieve all additions and removals of nodes from > the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) > as the left bound. After that apply these changes to the map with nodes > counters from {{MS.local.dataNodes}} and take nodes only with the positive > counters. This is the lastSeenTopology. Comparing it with > {{MS.local.logicalTopology}} will tell us which nodes were not added or > removed and weren't propagated to topologyAugmentationMap before restart. We > take these differences and add them to the topologyAugmentationMap. As a > revision (key for topologyAugmentationMap) take > {{MS.local.logicalTopology.revision}}. It is safe to take this revision, > because if some node was added to the {{ms.topology}} after immediate data > nodes recalculation, this added node must restore this immediate data nodes' > recalculation intent. -- This message was sent by Atlassian Jira (v8.20.10#820010)