[ 
https://issues.apache.org/jira/browse/IGNITE-21603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821695#comment-17821695
 ] 

Ignite TC Bot commented on IGNITE-21603:
----------------------------------------

{panel:title=Branch: [pull/11255/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/11255/head] Base: [master] : New Tests 
(2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}SPI (Discovery){color} [[tests 
2|https://ci2.ignite.apache.org/viewLog.html?buildId=7762653]]
* {color:#013220}IgniteSpiDiscoverySelfTestSuite: 
TcpDiscoveryNetworkIssuesTest.testBackwardNodeCheckWithSameLoopbackSeveralLocalAddresses
 - PASSED{color}
* {color:#013220}IgniteSpiDiscoverySelfTestSuite: 
TcpDiscoveryNetworkIssuesTest.testBackwardNodeCheckWithSameLoopbackSingleLocalAddress
 - PASSED{color}

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7762291&buildTypeId=IgniteTests24Java8_RunAll]

> Incorrect backward connection check with loopback address.
> ----------------------------------------------------------
>
>                 Key: IGNITE-21603
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21603
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Minor
>              Labels: ise
>             Fix For: 2.17
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We may skip backward connection check of a previous node if it has the same 
> loopback address as the current node.
> Consider:
> # Neither _IgniteConfiguration#setLocalHost()_ or 
> _TcpDiscoverySpi#setLocalAddress()_ is set. Or the localhost parameter is  
> "_0.0.0.0_". 
> # Nodes start on different hosts. All the available host addresses are 
> resolved.
> # Among the other addresses, all nodes get the loopback address 
> "127.0.0.1:47500" (47500 is the default tcp discovery port).
> # Cluster starts and works. 
> # Some node N (A) decides that the connection to node N+1 (B) is lost and 
> tries to connect to node N+2 (C) and sends _TcpDiscoveryHandshakeRequest_.
> # Before C accepts incoming A's connection, it decides to check B and pings 
> it with _ServerImpl#checkConnection(List<InetSocketAddress> addrs, int 
> timeout)_
> # Around here, the network is restored, and A can now connect to B anew.
> # "_127.0.0.1:47500_" is last in _List<InetSocketAddress>_ addrs by 
> _IgniteUtils#inetAddressesComparator(boolean sameHost)_. But the connect 
> attempts in _checkConnection(...)_ are parallel. "_127.0.0.1:47500_" answers 
> first.
> # C sees it can connect to "_127.0.0.1:47500_" and chooses it as the alive 
> address of B. Other pings to rest of B's addresses are ignored.
> # But "_127.0.0.1:47500_" is one of C's addresses. C realizes it pinged 
> itself and marks that B is not reachable:
> {code:java}
>          // If local node was able to connect to previous, confirm that it's 
> alive.
>          ok = liveAddr != null && (!liveAddr.getAddress().isLoopbackAddress() 
> || !locNode.socketAddresses().contains(liveAddr));
> {code}
> # C accepts connection from A and answers with 
> _TcpDiscoveryHandshakeResponse#previousNodeAlive() == false_
> # B is ok now, but A connects to C and B is kicked from the ring.
> The problem is that C pings itself with B's address "_127.0.0.1:47500_"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to