[ 
https://issues.apache.org/jira/browse/IGNITE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Steshin updated IGNITE-13014:
--------------------------------------
    Attachment: NodeFailureResearch.patch

> Remove double checking of node availability. 
> ---------------------------------------------
>
>                 Key: IGNITE-13014
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13014
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>              Labels: iep-45
>         Attachments: NodeFailureResearch.patch, WostCaseStepByStep.txt
>
>
> Proposal:
> Do not check failed node second time. Double node checking prolongs node 
> failure detection and gives no additional benefits. There are mesh and 
> hardcoded values in this routine.
> For the present, we have double checking of node availability. Let's imagine 
> node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks 
> Node 3 to establish permanent connection instead of node 2. Node 3 may try to 
> check node 2 too. Or may not.
> Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 
> 2 * IgniteConfiguretion.failureDetectionTimeout + 300ms. 
> See:
> ‘NodeFailureResearch.patch'. It creates test 'FailureDetectionResearch' which 
> emulates long answears on a failed node and measures failure detection delays.
> 'NodeFailureResearch.txt' - results of the test.
> 'WostCaseStepByStep.txt' - description how the worst case happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to