[
https://issues.apache.org/jira/browse/IGNITE-21603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821695#comment-17821695
]
Ignite TC Bot commented on IGNITE-21603:
----------------------------------------
{panel:title=Branch: [pull/11255/head] Base: [master] : No blockers
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/11255/head] Base: [master] : New Tests
(2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}SPI (Discovery){color} [[tests
2|https://ci2.ignite.apache.org/viewLog.html?buildId=7762653]]
* {color:#013220}IgniteSpiDiscoverySelfTestSuite:
TcpDiscoveryNetworkIssuesTest.testBackwardNodeCheckWithSameLoopbackSeveralLocalAddresses
- PASSED{color}
* {color:#013220}IgniteSpiDiscoverySelfTestSuite:
TcpDiscoveryNetworkIssuesTest.testBackwardNodeCheckWithSameLoopbackSingleLocalAddress
- PASSED{color}
{panel}
[TeamCity *--> Run :: All*
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7762291&buildTypeId=IgniteTests24Java8_RunAll]
> Incorrect backward connection check with loopback address.
> ----------------------------------------------------------
>
> Key: IGNITE-21603
> URL: https://issues.apache.org/jira/browse/IGNITE-21603
> Project: Ignite
> Issue Type: Bug
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Minor
> Labels: ise
> Fix For: 2.17
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> We may skip backward connection check of a previous node if it has the same
> loopback address as the current node.
> Consider:
> # Neither _IgniteConfiguration#setLocalHost()_ or
> _TcpDiscoverySpi#setLocalAddress()_ is set. Or the localhost parameter is
> "_0.0.0.0_".
> # Nodes start on different hosts. All the available host addresses are
> resolved.
> # Among the other addresses, all nodes get the loopback address
> "127.0.0.1:47500" (47500 is the default tcp discovery port).
> # Cluster starts and works.
> # Some node N (A) decides that the connection to node N+1 (B) is lost and
> tries to connect to node N+2 (C) and sends _TcpDiscoveryHandshakeRequest_.
> # Before C accepts incoming A's connection, it decides to check B and pings
> it with _ServerImpl#checkConnection(List<InetSocketAddress> addrs, int
> timeout)_
> # Around here, the network is restored, and A can now connect to B anew.
> # "_127.0.0.1:47500_" is last in _List<InetSocketAddress>_ addrs by
> _IgniteUtils#inetAddressesComparator(boolean sameHost)_. But the connect
> attempts in _checkConnection(...)_ are parallel. "_127.0.0.1:47500_" answers
> first.
> # C sees it can connect to "_127.0.0.1:47500_" and chooses it as the alive
> address of B. Other pings to rest of B's addresses are ignored.
> # But "_127.0.0.1:47500_" is one of C's addresses. C realizes it pinged
> itself and marks that B is not reachable:
> {code:java}
> // If local node was able to connect to previous, confirm that it's
> alive.
> ok = liveAddr != null && (!liveAddr.getAddress().isLoopbackAddress()
> || !locNode.socketAddresses().contains(liveAddr));
> {code}
> # C accepts connection from A and answers with
> _TcpDiscoveryHandshakeResponse#previousNodeAlive() == false_
> # B is ok now, but A connects to C and B is kicked from the ring.
> The problem is that C pings itself with B's address "_127.0.0.1:47500_"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)