[
https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789996#comment-17789996
]
Vladislav Pyatkov commented on IGNITE-20603:
--------------------------------------------
[~maliev] Thank you for your contribution.
Merged 95107c31be013298108deeeb1322874a9952a40a
> Restore logical topology change event on a node restart
> -------------------------------------------------------
>
> Key: IGNITE-20603
> URL: https://issues.apache.org/jira/browse/IGNITE-20603
> Project: Ignite
> Issue Type: Bug
> Reporter: Mirza Aliev
> Assignee: Mirza Aliev
> Priority: Major
> Labels: ignite-3
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> h3. *Motivation*
> It is possible that some events were propagated to {{ms.logicalTopology}},
> but restart happened when we were updating topologyAugmentationMap and other
> states in {{DistributionZoneManager#createMetastorageTopologyListener}}. That
> means that augmentation that must be added to
> {{zone.topologyAugmentationMap}} wasn't added and we need to recover this
> information, or nodesAttributes wasn't propogated to MS.
> h3. *Definition of done*
> On a node restart, all states, that were going to be updated during watch
> event in {{DistributionZoneManager#createMetastorageTopologyListener}} must
> be recovered
> h3. *Implementation notes*
> (outdated, see UPD)
> For every zone, compare {{MS.local.logicalTopology.revision}} with
> max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is
> greater than max(maxScUpFromMap, maxScDownFromMap), that means that some
> topology changes haven't been propagated to topologyAugmentationMap before
> restart and appropriate timers haven't been scheduled. To fill the gap in
> topologyAugmentationMap, compare {{MS.local.logicalTopology}} with
> {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the
> nodes that did not have time to be propagated to topologyAugmentationMap
> before restart. {{lastSeenTopology}} is calculated in the following way: we
> read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey,
> scaleDownTriggerKey) and retrieve all additions and removals of nodes from
> the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey)
> as the left bound. After that apply these changes to the map with nodes
> counters from {{MS.local.dataNodes}} and take nodes only with the positive
> counters. This is the lastSeenTopology. Comparing it with
> {{MS.local.logicalTopology}} will tell us which nodes were not added or
> removed and weren't propagated to topologyAugmentationMap before restart. We
> take these differences and add them to the topologyAugmentationMap. As a
> revision (key for topologyAugmentationMap) take
> {{MS.local.logicalTopology.revision}}. It is safe to take this revision,
> because if some node was added to the {{ms.topology}} after immediate data
> nodes recalculation, this added node must restore this immediate data nodes'
> recalculation intent.
> UPD: Implementation notes are outdated, we've implemented a bit different
> approach: now we save the last handled topology to MS, and on restart we
> restore global states according to states from local metastorage and check if
> the current ms.logicalTopology differs from the one that was handled in
> DistributionZoneManager#createMetastorageTopologyListener (we check revision
> of this events), then we just repeat the logic from
> DistributionZoneManager#createMetastorageTopologyListener with the new
> logical topology from the ms.logicalTopology.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)