[
https://issues.apache.org/jira/browse/IGNITE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Steshin updated IGNITE-13014:
--------------------------------------
Description:
Proposal:
Do not check failed node second time. Double node checking prolongs node
failure detection and gives no additional benefits. There are mesh and
hardcoded values in this routine.
For the present, we have double checking of node availability. Let's imagine
node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks
Node 3 to establish permanent connection instead of node 2. Node 3 may try to
check node 2 too. Or may not.
Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 2
* IgniteConfiguretion.failureDetectionTimeout + 300ms.
was:
Proposal:
Do not check failed node second time. Double node checking prolongs node
failure detection and gives no additional benefits. There are mesh and
hardcoded values in this routine.
For the present, we have double checking of node availability. Let's imagine
node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks
Node 3 to establish permanent connection instead of node 2. Node 3 may try to
check node 2 too. Or may not.
Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 2
* IgniteConfiguretion.failureDetectionTimeout + 300ms. See ‘WostCase.txt’
> Remove double checking of node availability.
> ---------------------------------------------
>
> Key: IGNITE-13014
> URL: https://issues.apache.org/jira/browse/IGNITE-13014
> Project: Ignite
> Issue Type: Improvement
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Labels: iep-45
>
> Proposal:
> Do not check failed node second time. Double node checking prolongs node
> failure detection and gives no additional benefits. There are mesh and
> hardcoded values in this routine.
> For the present, we have double checking of node availability. Let's imagine
> node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks
> Node 3 to establish permanent connection instead of node 2. Node 3 may try to
> check node 2 too. Or may not.
> Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL +
> 2 * IgniteConfiguretion.failureDetectionTimeout + 300ms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)