[ 
https://issues.apache.org/jira/browse/IGNITE-20310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-20310:
---------------------------------
    Description: 
h3. *Motivation*

There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invokes in 
DistributionZoneManager#createOrRestoreZoneState:
# DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to init the 
default zone.
# DistributionZoneManager#restoreTimers in case when a filter update was 
handled before DZM stop, but it didn't update data nodes.

Futures of these invokes are ignored. So after the start method is completed 
actually not all start actions are completed. It can lead to the following 
situation: 
* Initialisation of the default zone is hanged for some reason even after full 
restart of the cluster.
* That means that all data nodes related keys in metastorage haven't been 
initialised.
* For example, if user add some new node, and scale up timer is immediate, 
which leads to immediate data nodes recalculation, this recalculation won't 
happen, because data nodes key have not been initialised. 

h3. *Possible solutions*
h4. Easier
We just need to wait for all async logic to be completed within the 
{{DistributionZoneManager#start}} with {{ms.invoke().join()}}

h4. Harder
We can enhance {{IgniteComponent#start}}, so it could return Completable 
future, and after that we need to change the flow of starting components, so 
node is not ready to work until all {{IgniteComponent#start}} futures are 
completed. 



h3. *Definition of done*

All asynchronous logic in the `DistributionZoneManager#start` is done before a 
node is ready to work, in particular, ready to interact with zones.


  was:
h3. *Motivation*

There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invokes in 
DistributionZoneManager#createOrRestoreZoneState:
# DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to init the 
default zone.
# DistributionZoneManager#restoreTimers in case when a filter update was 
handled before DZM stop, but it didn't update data nodes.

Futures of these invokes are ignored. So after the start method is completed 
actually not all start actions are completed. It can lead to the following 
situation: 
* Initialisation of the default zone is hanged for some reason even after full 
restart of the cluster.
* That means that all data nodes related keys in metastorage haven't been 
initialised.
* For example, if user add some new node, and scale up timer is immediate, 
which leads to immediate data nodes recalculation, this recalculation won't 
happen, because data nodes key have not been initialised. 

h3. *Possible solutions*
h4. Easier
We just need to wait for all async logic to be completed within the 
{{DistributionZoneManager#start}} with {{ms.invoke().join()}}
,
h4. Harder
We can enhance {{IgniteComponent#start}}, so it could return Completable 
future, and after that we need to change the flow of starting components, so 
node is not ready to work until all {{IgniteComponent#start}} futures are 
completed. 



h3. *Definition of done*

All asynchronous logic in the `DistributionZoneManager#start` is done before a 
node is ready to work, in particular, ready to interact with zones.



> Meta storage invokes are not completed  when DZM start is completed
> -------------------------------------------------------------------
>
>                 Key: IGNITE-20310
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20310
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Sergey Uttsel
>            Priority: Major
>              Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager start. Currently it 
> does the meta storage invokes in 
> DistributionZoneManager#createOrRestoreZoneState:
> # DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to init 
> the default zone.
> # DistributionZoneManager#restoreTimers in case when a filter update was 
> handled before DZM stop, but it didn't update data nodes.
> Futures of these invokes are ignored. So after the start method is completed 
> actually not all start actions are completed. It can lead to the following 
> situation: 
> * Initialisation of the default zone is hanged for some reason even after 
> full restart of the cluster.
> * That means that all data nodes related keys in metastorage haven't been 
> initialised.
> * For example, if user add some new node, and scale up timer is immediate, 
> which leads to immediate data nodes recalculation, this recalculation won't 
> happen, because data nodes key have not been initialised. 
> h3. *Possible solutions*
> h4. Easier
> We just need to wait for all async logic to be completed within the 
> {{DistributionZoneManager#start}} with {{ms.invoke().join()}}
> h4. Harder
> We can enhance {{IgniteComponent#start}}, so it could return Completable 
> future, and after that we need to change the flow of starting components, so 
> node is not ready to work until all {{IgniteComponent#start}} futures are 
> completed. 
> h3. *Definition of done*
> All asynchronous logic in the `DistributionZoneManager#start` is done before 
> a node is ready to work, in particular, ready to interact with zones.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to