Roman Puchkovskiy created IGNITE-20076:
------------------------------------------
Summary: Improve networking shutdown implementation
Key: IGNITE-20076
URL: https://issues.apache.org/jira/browse/IGNITE-20076
Project: Ignite
Issue Type: Bug
Reporter: Roman Puchkovskiy
Assignee: Roman Puchkovskiy
Currently, when initiating an Ignite's node shutdown, we first stop ScaleCube's
cluster (so that it sends a LEAVING message) and only when it's completely
shutdown do we shut the connection manager. As a result, there is some interval
when the node's networking thinks it's still alive (and hence it tries to
restore connections with other nodes), but other nodes think the node has
already left (as they received that LEAVING message from it), so they don't let
it establish connections. The first node sees that it is rejected and tries to
handle this is a critical failure. Currently, it just logs a scary message,
but, when we implement a proper failure handler, this will kill the node. This
is not ok for a graceful stop scenario.
The idea is to first (before stopping the ScaleCube local cluster) is to tell
the connection manager that it is now in the 'stopping' state. In this state,
it does not try to establish new connections (and does not attempt to
reconnect) and does not allow any incoming connections; also, it does not
handle rejections by other nodes as critical failures in this state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)