Vladsz83 opened a new pull request #8881:
URL: https://github.com/apache/ignite/pull/8881


   If node looses outgoing connections, it can decide it is alone in the 
cluster and won't fail. Happens on small clusters where failed node attempts to 
connect to every other node before connRecoveryTimeout expires.
   
   Consider:
   
   The cluster n1 -> n2 -> n3 -> n4 -> n1
   n4 looses all outgoing connections.
   n3 keeps successful ping to n4.
   n4 attempts to connect to n1, n2, n3. Fails with each due to outgoing 
network failure.
   spi.connrecoveryTimeout is not reached. n4 decides it is alone and continues 
working.
   n3 still sends messages to n4. n4 does not lack incoming connections.
   ring is actually broken because of n4. n3 cannot determine failure of n4.
   Solution: node could watch its incoming traffic which notyfies of the 
incoming network. If all the outgoing connections are lost but messages are 
received, node must left the grid to prevent ring break.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to