[ https://issues.apache.org/jira/browse/IGNITE-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexandr Kuramshin reassigned IGNITE-6700: ------------------------------------------ Assignee: Alexandr Kuramshin (was: Semen Boikov) > Node considered as failed can cause failure of others nodes > ----------------------------------------------------------- > > Key: IGNITE-6700 > URL: https://issues.apache.org/jira/browse/IGNITE-6700 > Project: Ignite > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: general > Reporter: Semen Boikov > Assignee: Alexandr Kuramshin > Priority: Critical > > Node considered as failed can cause failure of others nodes in cluster. > There is an issue in TcpDiscoveryAbstractMessage.failedNodes processing, if > message is received from node considered as failed, then failedNodes should > be ignored. > Possible scenario: > - there are 4 nodes (1 -> 2 -> 3 -> 4) > - node 3 temporary lost connection with others > - node 2 considers 3 as failed, node failed event is fired for 3 > - node 3 considers 4 as failed, adds 4 in nodeFailedList, then it restores > connection with 1 and currently 1 will process nodeFailedList from 3 (even if > 3 is already considered as failed) -- This message was sent by Atlassian JIRA (v6.4.14#64029)