[ 
https://issues.apache.org/jira/browse/IGNITE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Muzafarov updated IGNITE-10485:
-------------------------------------
    Fix Version/s:     (was: 2.8)
                   2.9

> Ability to get know more about cluster state before NODE_JOINED event is 
> fired cluster-wide
> -------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-10485
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10485
>             Project: Ignite
>          Issue Type: Improvement
>          Components: cache
>            Reporter: Pavel Kovalenko
>            Priority: Major
>             Fix For: 2.9
>
>
> Currently there are no good possibilities to get more knowledge about cluster 
> before PME on node join is started.
> It might be usefult to do some pre-work (activate components if cluster is 
> active, calculate baseline affinity, cleanup pds if baseline changed, etc.) 
> before actual NODE_JOIN event is triggered cluster-wide and PME is started.
> Such pre-work will significantly speed-up PME in case of node join.
> Currently the only place where it can be done is during processing NodeAdded 
> message on local joining node. 
> But it's not a good idea, because it will freeze processing new discovery 
> messages cluster-wide.
> I see 2 ways how to implement it:
> 1) Introduce new intermediate state of node when it's discovered, but 
> discovery event on node join is not triggered yet. This is right, but 
> complicated change, because it requires revisiting joining process both in 
> Tcp and Zk discovery protocols with extra failover scenarios.
> 2) Try to get this information and do pre-work before discovery manager 
> start, using e.g. GridRestProcessor. This looks much simplier, but we can 
> have some races there, when during pre-work cluster state has been changed 
> (deactivation, baseline change). In this case we should rollback it or just 
> stop/restart the node to avoid cluster instability. However these are rare 
> scenarios in real world (e.g. start baseline node and start deactivation 
> process right after node recovery is finished).
> For starters we can expose baseline and cluster state in our rest endpoint 
> and try to move out mentioned above pre-work things from PME. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to