[
https://issues.apache.org/jira/browse/IGNITE-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705856#comment-17705856
]
Sergey Uttsel commented on IGNITE-19104:
----------------------------------------
Actually we have a race between async 'logicalTopology' initialization in
initDataNodesFromVaultManager() and 'logicalTopology' updating in
DistributionZoneManager#watchListener. So we need to initialize
'logicalTopology' sync in DistributionZoneManager#start().
Another issues is a race in async invocation of
DistributionZoneManager#saveDataNodesAndUpdateTriggerKeysInMetaStorage in
DistributionZoneManager#start(). This method use zonesChangeTriggerKey(zoneId)
as a condition for metastorage invoke. And parallel metastorage invokes in
DistributionZoneManager#watchListener which use
zoneScaleUpChangeTriggerKey(zoneId)/zoneScaleDownChangeTriggerKey(zoneId) as a
condition.
So I think need to sync invoke
DistributionZoneManager#saveDataNodesAndUpdateTriggerKeysInMetaStorage on
DistributionZoneManager#start().
> Late logicalTopology initialization in DistributionZoneManager
> --------------------------------------------------------------
>
> Key: IGNITE-19104
> URL: https://issues.apache.org/jira/browse/IGNITE-19104
> Project: Ignite
> Issue Type: Bug
> Reporter: Andrey Mashenkov
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> DistributionZoneManager run next methods on start
> {code:java}
> initDataNodesFromVaultManager();
> initLogicalTopologyAndVersionInMetaStorageOnStart();
> {code}
> The first method gets logicalTopology from Vault and try to put it into
> MetaStorage.
> The second one gets logicalTopology from CMG and try to put it into
> MetaStorage.
> Both methods actually asynchronous, because Vault.get() and
> TopologyService.logicalTopologyOnLeader() are async.
> There are 2 issues:
> * these methods may run concurrently in separate threads
> * we unconditionally rewrite local volatile field 'logicalTopology' in
> initDataNodesFromVaultManager()
> Thus, we may see initial value (empty topology) after
> DistributionZoneManager.start() finish.
> Also, seems, there is a chance to see stale value from Vault, however a new
> value was got from config, then rewritten by stale value.
> DistributionZoneManagerConfigurationChangesTest passes, because test
> Metastorage initialization happens before the
> DistributionZoneManagerConfigurationChangesTest starts (in reality, they
> start in different order),
> and because test initialization seems a bit slower than
> DistributionZoneManagerConfigurationChangesTest.start().
--
This message was sent by Atlassian Jira
(v8.20.10#820010)