[
https://issues.apache.org/jira/browse/IGNITE-20310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mirza Aliev updated IGNITE-20310:
---------------------------------
Labels: dzm-reviewed ignite-3 (was: ignite-3)
> Meta storage invokes are not completed when DZM start is completed
> -------------------------------------------------------------------
>
> Key: IGNITE-20310
> URL: https://issues.apache.org/jira/browse/IGNITE-20310
> Project: Ignite
> Issue Type: Bug
> Reporter: Sergey Uttsel
> Priority: Major
> Labels: dzm-reviewed, ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager start. Currently it
> does the meta storage invokes in
> DistributionZoneManager#createOrRestoreZoneState:
> # DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to init
> the default zone.
> # DistributionZoneManager#restoreTimers in case when a filter update was
> handled before DZM stop, but it didn't update data nodes.
> Futures of these invokes are ignored. So after the start method is completed
> actually not all start actions are completed. It can lead to the following
> situation:
> * Initialisation of the default zone is hanged for some reason even after
> full restart of the cluster.
> * That means that all data nodes related keys in metastorage haven't been
> initialised.
> * For example, if user add some new node, and scale up timer is immediate,
> which leads to immediate data nodes recalculation, this recalculation won't
> happen, because data nodes key have not been initialised.
> h3. *Possible solutions*
> h4. Easier
> We just need to wait for all async logic to be completed within the
> {{DistributionZoneManager#start}} with {{ms.invoke().join()}}
> h4. Harder
> We can enhance {{IgniteComponent#start}}, so it could return Completable
> future, and after that we need to change the flow of starting components, so
> node is not ready to work until all {{IgniteComponent#start}} futures are
> completed. For example, we can chain our futures on
> {{IgniteImpl#recoverComponentsStateOnStart}}, so components' futures are
> completed before {{metaStorageMgr.deployWatches()}}.
> In {{DistributionZoneManager#start}} we can return
> {{CompletableFuture.allOf}} features, that are needed to be completed in the
> {{DistributionZoneManager#start}}
> h3. *Definition of done*
> All asynchronous logic in the {{DistributionZoneManager#start}} is done
> before a node is ready to work, in particular, ready to interact with zones.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)