Pavel Kovalenko created IGNITE-10485:
----------------------------------------

             Summary: Ability to get know more about cluster state before 
NODE_JOINED event is fired cluster-wide
                 Key: IGNITE-10485
                 URL: https://issues.apache.org/jira/browse/IGNITE-10485
             Project: Ignite
          Issue Type: Improvement
          Components: cache
            Reporter: Pavel Kovalenko
             Fix For: 2.8


Currently there are no good possibilities to get more knowledge about cluster 
before PME on node join start.

It might be usefult to do some pre-work (activate components if cluster is 
active, calculate baseline affinity, cleanup pds if baseline changed, etc.) 
before actual NODE_JOIN event is triggered cluster-wide and PME is started.
Such pre-work will significantly speed-up PME in case of node join.
Currently the only place where it can be done is during processing NodeAdded 
message on local joining node. 
But it's not a good idea, because it will freeze processing new discovery 
messages cluster-wide.

I see 2 ways how to implement it:

1) Introduce new intermediate state of node when it's discovered, but discovery 
event on node join is not triggered yet. This is right, but complicated change, 
because it requires revisiting joining process both in Tcp and Zk discovery 
protocols with extra failover scenarios.

2) Try to get this information and do pre-work before discovery manager start, 
using e.g. GridRestProcessor. This looks much simplier, but we can have some 
races there, when during pre-work cluster state has been changed (deactivation, 
baseline change). In this case we should rollback it or just stop/restart the 
node to avoid cluster instability. However these are rare scenarios in real 
world (e.g. start baseline node and start deactivation process right after node 
recovery is finished).

For starters we can expose baseline and cluster state in our rest endpoint and 
try to move out mentioned above pre-work things from PME. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to