[ 
https://issues.apache.org/jira/browse/IGNITE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415160#comment-17415160
 ] 

Ignite TC Bot commented on IGNITE-14068:
----------------------------------------

{panel:title=Branch: [pull/9393/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/9393/head] Base: [master] : New Tests 
(3)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}SPI{color} [[tests 
3|https://ci.ignite.apache.org/viewLog.html?buildId=6179706]]
* {color:#013220}IgniteSpiTestSuite: 
TcpDiscoverySelfTest.testIncomingConnectionsFailure - PASSED{color}
* {color:#013220}IgniteSpiTestSuite: 
TcpDiscoverySslTrustedSelfTest.testIncomingConnectionsFailure - PASSED{color}
* {color:#013220}IgniteSpiTestSuite: 
TcpDiscoverySslSelfTest.testIncomingConnectionsFailure - PASSED{color}

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=6179777&buildTypeId=IgniteTests24Java8_RunAll]

> Infinite node presence in the ring while outgoing connections are lost
> ----------------------------------------------------------------------
>
>                 Key: IGNITE-14068
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14068
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>          Time Spent: 10h
>  Remaining Estimate: 0h
>
> If node looses outgoing connections, it can decide it is alone in the cluster 
> and won't fail. Happens on small clusters where failed node attempts to 
> connect to every other node before connRecoveryTimeout expires.
> Consider:
> - The cluster n1 -> n2 -> n3 -> n4 -> n1
> - n4 looses all outgoing connections.
> - n3 keeps successful ping to n4.
> - n4 attempts to connect to n1, n2, n3. Fails with each due to outgoing 
> network failure.
> - spi.connrecoveryTimeout is not reached. n4 decides it is alone and 
> continues working.
> - n3 still sends messages to n4. n4 does not lack incoming connections.
> - ring is actually broken because of n4. n3 cannot determine failure of n4.
> Solution: node could watch its incoming traffic which notyfies of the 
> incoming network. If all the outgoing connections are lost but messages are 
> received, node must left the grid to prevent ring break.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to