[
https://issues.apache.org/jira/browse/IGNITE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maxim Muzafarov updated IGNITE-10485:
-------------------------------------
Fix Version/s: (was: 2.8)
2.9
> Ability to get know more about cluster state before NODE_JOINED event is
> fired cluster-wide
> -------------------------------------------------------------------------------------------
>
> Key: IGNITE-10485
> URL: https://issues.apache.org/jira/browse/IGNITE-10485
> Project: Ignite
> Issue Type: Improvement
> Components: cache
> Reporter: Pavel Kovalenko
> Priority: Major
> Fix For: 2.9
>
>
> Currently there are no good possibilities to get more knowledge about cluster
> before PME on node join is started.
> It might be usefult to do some pre-work (activate components if cluster is
> active, calculate baseline affinity, cleanup pds if baseline changed, etc.)
> before actual NODE_JOIN event is triggered cluster-wide and PME is started.
> Such pre-work will significantly speed-up PME in case of node join.
> Currently the only place where it can be done is during processing NodeAdded
> message on local joining node.
> But it's not a good idea, because it will freeze processing new discovery
> messages cluster-wide.
> I see 2 ways how to implement it:
> 1) Introduce new intermediate state of node when it's discovered, but
> discovery event on node join is not triggered yet. This is right, but
> complicated change, because it requires revisiting joining process both in
> Tcp and Zk discovery protocols with extra failover scenarios.
> 2) Try to get this information and do pre-work before discovery manager
> start, using e.g. GridRestProcessor. This looks much simplier, but we can
> have some races there, when during pre-work cluster state has been changed
> (deactivation, baseline change). In this case we should rollback it or just
> stop/restart the node to avoid cluster instability. However these are rare
> scenarios in real world (e.g. start baseline node and start deactivation
> process right after node recovery is finished).
> For starters we can expose baseline and cluster state in our rest endpoint
> and try to move out mentioned above pre-work things from PME.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)