[ 
https://issues.apache.org/jira/browse/IGNITE-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705856#comment-17705856
 ] 

Sergey Uttsel commented on IGNITE-19104:
----------------------------------------

Actually we have a race between async 'logicalTopology' initialization in 
initDataNodesFromVaultManager() and 'logicalTopology' updating in 
DistributionZoneManager#watchListener. So we need to initialize 
'logicalTopology' sync in DistributionZoneManager#start().
Another issues is a race in async invocation of 
DistributionZoneManager#saveDataNodesAndUpdateTriggerKeysInMetaStorage in 
DistributionZoneManager#start(). This method use zonesChangeTriggerKey(zoneId) 
as a condition for metastorage invoke. And parallel metastorage invokes in 
DistributionZoneManager#watchListener which use 
zoneScaleUpChangeTriggerKey(zoneId)/zoneScaleDownChangeTriggerKey(zoneId) as a 
condition.
So I think need to sync invoke 
DistributionZoneManager#saveDataNodesAndUpdateTriggerKeysInMetaStorage on 
DistributionZoneManager#start().

> Late logicalTopology initialization in DistributionZoneManager
> --------------------------------------------------------------
>
>                 Key: IGNITE-19104
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19104
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Andrey Mashenkov
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> DistributionZoneManager run next methods on start
> {code:java}
> initDataNodesFromVaultManager();
> initLogicalTopologyAndVersionInMetaStorageOnStart();
> {code}
> The first method gets logicalTopology from Vault and try to put it into 
> MetaStorage.
> The second one gets logicalTopology from CMG and try to put it into 
> MetaStorage.
> Both methods actually asynchronous, because Vault.get() and 
> TopologyService.logicalTopologyOnLeader() are async.
> There are 2 issues:
> * these methods may run concurrently in separate threads
> * we unconditionally rewrite local volatile field 'logicalTopology'  in 
> initDataNodesFromVaultManager()
> Thus, we may see initial value (empty topology) after 
> DistributionZoneManager.start() finish.
> Also, seems, there is a chance to see stale value from Vault, however a new 
> value was got from config, then rewritten by stale value.
> DistributionZoneManagerConfigurationChangesTest passes, because test 
> Metastorage initialization happens before the 
> DistributionZoneManagerConfigurationChangesTest starts (in reality, they 
> start in different order), 
> and because test initialization seems a bit slower than 
> DistributionZoneManagerConfigurationChangesTest.start().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to