[
https://issues.apache.org/jira/browse/IGNITE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Steshin updated IGNITE-13014:
--------------------------------------
Attachment: NodeFailureResearch.patch
> Remove double checking of node availability.
> ---------------------------------------------
>
> Key: IGNITE-13014
> URL: https://issues.apache.org/jira/browse/IGNITE-13014
> Project: Ignite
> Issue Type: Improvement
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Labels: iep-45
> Attachments: NodeFailureResearch.patch, WostCaseStepByStep.txt
>
>
> Proposal:
> Do not check failed node second time. Double node checking prolongs node
> failure detection and gives no additional benefits. There are mesh and
> hardcoded values in this routine.
> For the present, we have double checking of node availability. Let's imagine
> node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks
> Node 3 to establish permanent connection instead of node 2. Node 3 may try to
> check node 2 too. Or may not.
> Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL +
> 2 * IgniteConfiguretion.failureDetectionTimeout + 300ms.
> See:
> ‘NodeFailureResearch.patch'. It creates test 'FailureDetectionResearch' which
> emulates long answears on a failed node and measures failure detection delays.
> 'NodeFailureResearch.txt' - results of the test.
> 'WostCaseStepByStep.txt' - description how the worst case happens.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)