[jira] [Updated] (IGNITE-13014) Remove double checking of node availability.

Vladimir Steshin (Jira) Thu, 28 May 2020 08:06:24 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vladimir Steshin updated IGNITE-13014:
--------------------------------------
    Description: 
Proposal:
Do not check failed node second time. Double node checking prolongs node 
failure detection and gives no additional benefits. There are mesh and 
hardcoded values in this routine.

For the present, we have double checking of node availability. Let's imagine 
node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks 
Node 3 to establish permanent connection instead of node 2. Node 3 may try to 
check node 2 too. Or may not.

Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 2 
* IgniteConfiguretion.failureDetectionTimeout + 300ms. 

See:

‘NodeFailureResearch.patch'. It creates test 'FailureDetectionResearch' which 
emulates long answears on a failed node and measures failure detection delays.
'NodeFailureResearch.txt' - results of the test.
'WostCaseStepByStep.txt' - description how the worst case happens.



  was:
Proposal:
Do not check failed node second time. Double node checking prolongs node 
failure detection and gives no additional benefits. There are mesh and 
hardcoded values in this routine.

For the present, we have double checking of node availability. Let's imagine 
node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks 
Node 3 to establish permanent connection instead of node 2. Node 3 may try to 
check node 2 too. Or may not.

Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 2 
* IgniteConfiguretion.failureDetectionTimeout + 300ms. 




> Remove double checking of node availability. 
> ---------------------------------------------
>
>                 Key: IGNITE-13014
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13014
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>              Labels: iep-45
>         Attachments: NodeFailureResearch.patch, WostCaseStepByStep.txt
>
>
> Proposal:
> Do not check failed node second time. Double node checking prolongs node 
> failure detection and gives no additional benefits. There are mesh and 
> hardcoded values in this routine.
> For the present, we have double checking of node availability. Let's imagine 
> node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks 
> Node 3 to establish permanent connection instead of node 2. Node 3 may try to 
> check node 2 too. Or may not.
> Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 
> 2 * IgniteConfiguretion.failureDetectionTimeout + 300ms. 
> See:
> ‘NodeFailureResearch.patch'. It creates test 'FailureDetectionResearch' which 
> emulates long answears on a failed node and measures failure detection delays.
> 'NodeFailureResearch.txt' - results of the test.
> 'WostCaseStepByStep.txt' - description how the worst case happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (IGNITE-13014) Remove double checking of node availability.

Reply via email to