Vladimir Steshin created IGNITE-13014:
-----------------------------------------
Summary: Remove long, double checking of node availability. Fix
hardcoded values.
Key: IGNITE-13014
URL: https://issues.apache.org/jira/browse/IGNITE-13014
Project: Ignite
Issue Type: Improvement
Reporter: Vladimir Steshin
Assignee: Vladimir Steshin
For the present, we have duplicated checking of node availability. This
prolongs node failure detection and gives no additional benefits. There are
mesh and hardcoded values in this routine.
Let's imagine node 2 doesn't answer any more. Node 1 becomes unable to ping
node 2 and asks Node 3 to establish permanent connection instead of node 2.
Despite node 2 has been already pinged within configured timeouts, node 3 try
to connect to node 2 too.
Disadvantages:
1) Possible long detection of node failure up to
ServerImpl.CON_CHECK_INTERVAL + 2 * IgniteConfiguretion.failureDetectionTimeout
+ 300ms. See ‘WostCase.txt’
2) Unexpected, not-configurable decision to check availability of previous
node based on ‘2 * ServerImpl.CON_CHECK_INTERVAL‘:
// We got message from previous in less than double connection check interval.
boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now;
If ‘ok == true’ node 3 checks node 2.
3) Double node checking brings several not-configurable hardcoded delays:
Node 3 checks node 2 with hardcoded timeout 100ms:
ServerImpl.isConnectionRefused():
sock.connect(addr, 100);
Checking availability of previous node considers any exception but
ConnectionException (connection refused) as existing connection. Even a
timeout. See ServerImpl.isConnectionRefused().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)