[jira] [Created] (IGNITE-20076) Improve networking shutdown implementation

Roman Puchkovskiy (Jira) Thu, 27 Jul 2023 07:00:49 -0700

Roman Puchkovskiy created IGNITE-20076:
------------------------------------------


             Summary: Improve networking shutdown implementation
                 Key: IGNITE-20076
                 URL: https://issues.apache.org/jira/browse/IGNITE-20076
             Project: Ignite
          Issue Type: Bug
            Reporter: Roman Puchkovskiy
            Assignee: Roman Puchkovskiy


Currently, when initiating an Ignite's node shutdown, we first stop ScaleCube's 
cluster (so that it sends a LEAVING message) and only when it's completely 
shutdown do we shut the connection manager. As a result, there is some interval 
when the node's networking thinks it's still alive (and hence it tries to 
restore connections with other nodes), but other nodes think the node has 
already left (as they received that LEAVING message from it), so they don't let 
it establish connections. The first node sees that it is rejected and tries to 
handle this is a critical failure. Currently, it just logs a scary message, 
but, when we implement a proper failure handler, this will kill the node. This 
is not ok for a graceful stop scenario.
The idea is to first (before stopping the ScaleCube local cluster) is to tell 
the connection manager that it is now in the 'stopping' state. In this state, 
it does not try to establish new connections (and does not attempt to 
reconnect) and does not allow any incoming connections; also, it does not 
handle rejections by other nodes as critical failures in this state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20076) Improve networking shutdown implementation

Reply via email to