Vladsz83 opened a new pull request #8484:
URL: https://github.com/apache/ignite/pull/8484


   Scenario:
   Two nodes fail at the same time. The nodes have relative places in the ring: 
N-1 and N+2.
   Node N detects failure of node N+1. Node N tries to connect to node N+2. 
Node N+2 checks backward connection to node N+1.
   
   Problem:
   Node N can fail too.
   
   Cause:
   The timeout on node N to recover connection to node N+2 appears shorter than 
timeout on node N+2 to check connection to N+1.
   
   Fix:
   Introduced a fundamental timeout value to check/recover connection based on 
current configuration. Not a constant. Mentioned above timeouts have been made 
relative. The timeout of backward connection check is now generally shorter 
than the timeout to recover connection.
   
   Additions:
   - Brought some logs to have diagnostoc ability. It was hard to realize the 
issue without them.
   - Some renamings and minor optimizations to avoid mess in ping / connection 
checks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to