Vladsz83 opened a new pull request #8881: URL: https://github.com/apache/ignite/pull/8881
If node looses outgoing connections, it can decide it is alone in the cluster and won't fail. Happens on small clusters where failed node attempts to connect to every other node before connRecoveryTimeout expires. Consider: The cluster n1 -> n2 -> n3 -> n4 -> n1 n4 looses all outgoing connections. n3 keeps successful ping to n4. n4 attempts to connect to n1, n2, n3. Fails with each due to outgoing network failure. spi.connrecoveryTimeout is not reached. n4 decides it is alone and continues working. n3 still sends messages to n4. n4 does not lack incoming connections. ring is actually broken because of n4. n3 cannot determine failure of n4. Solution: node could watch its incoming traffic which notyfies of the incoming network. If all the outgoing connections are lost but messages are received, node must left the grid to prevent ring break. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
