[ 
https://issues.apache.org/jira/browse/IGNITE-21603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Steshin updated IGNITE-21603:
--------------------------------------
    Description: 
We may skip backward connection check to a previous node if it has the same 
loopback address as the current node.

Consider:
# Neither _IgniteConfiguration#setLocalHost()_ or 
_TcpDiscoverySpi#setLocalAddress()_ is set. Or the localhost parameter is set 
to "_0.0.0.0_". 
# Nodes start on different hosts. All the available host addresses are resolved 
and
# Among the other addresses, all nodes get the loopback address 
"127.0.0.1:47500" (47500 is the default tcp discovery port).
# Cluster starts and works. But 
# Some node N (A) decides the connection to node N+1 (B) is lost and tries to 
connect to node N+2 (C) and sends _TcpDiscoveryHandshakeRequest_.
# Before C accepts incoming A's connection, it decides to check B and pings it 
with _ServerImpl#checkConnection(List<InetSocketAddress> addrs, int timeout)_
# Around here, the network is restored, and A can now connect to B anew.
# "_127.0.0.1:47500_" is last in _List<InetSocketAddress>_ addrs by 
_IgniteUtils#inetAddressesComparator(boolean sameHost)_. But the connect 
attempts in _checkConnection(...)_ are parallel. "_127.0.0.1:47500_" answers 
first.
# C sees it can connect to "_127.0.0.1:47500_" and chooses it as the alive 
address of B. Other pings to rest of B's addresses are ignored.
# But "_127.0.0.1:47500_" is one of C's addresses. C realizes it pinged itself 
and decides that B is not reachable:
{code:java}
         // If local node was able to connect to previous, confirm that it's 
alive.
         ok = liveAddr != null && (!liveAddr.getAddress().isLoopbackAddress() 
|| !locNode.socketAddresses().contains(liveAddr));
{code}
# C accepts connection from A and answers with 
_TcpDiscoveryHandshakeResponse#previousNodeAlive() == false_
# But B is ok now. But A connects to C and B is kicked from the ring.

The problem is that C ping itself by B's address "_127.0.0.1:47500_"

  was:
We may skip backward connection check to a previous node if it has the same 
loopback address as the current node.

Consider:
# Neither _IgniteConfiguration#setLocalHost()_ or 
_TcpDiscoverySpi#setLocalAddress()_ is set. Or the localhost parameter is set 
to "_0.0.0.0_". 
# Nodes start on different hosts. All the available host addresses are resolved 
and
# Among the other addresses, all nodes get the loopback address 
"127.0.0.1:47500" (47500 is the default tcp discovery port).
# Cluster starts and works. But 
# Some node N (A) decides the connection to node N+1 (B) is lost and tries to 
connect to node N+2 (C) and sends _TcpDiscoveryHandshakeRequest_.
# Before C accepts incoming A's connection, it decides to check B and pings it 
with _ServerImpl#checkConnection(List<InetSocketAddress> addrs, int timeout)_
# Around here, the network is restored, and A can now connect to B anew.
# "_127.0.0.1:47500_" is last in _List<InetSocketAddress>_ addrs by 
_IgniteUtils#inetAddressesComparator(boolean sameHost)_. But the connect 
attempts in _checkConnection(...)_ are parallel. "_127.0.0.1:47500_" answers 
first.
# C sees it can connect to "_127.0.0.1:47500_" and chooses it as the alive 
address of B. Other pings to rest of B's addresses are ignored.
# But "_127.0.0.1:47500_" is one of C's addresses. C realizes it pinged itself 
and decides that B is not reachable:
{code:java}
         // If local node was able to connect to previous, confirm that it's 
alive.
         ok = liveAddr != null && (!liveAddr.getAddress().isLoopbackAddress() 
|| !locNode.socketAddresses().contains(liveAddr));
{code}
_emphasized text_C accepts connection from A and answers with 
TcpDiscoveryHandshakeResponse#previousNodeAlive() == false
# But B is ok now. But A connects to C and B is kicked from the ring.

The problem is that C ping itself by B's address "_127.0.0.1:47500_"


> Incorrect bacward connection chech with loopback address
> --------------------------------------------------------
>
>                 Key: IGNITE-21603
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21603
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Vladimir Steshin
>            Priority: Major
>
> We may skip backward connection check to a previous node if it has the same 
> loopback address as the current node.
> Consider:
> # Neither _IgniteConfiguration#setLocalHost()_ or 
> _TcpDiscoverySpi#setLocalAddress()_ is set. Or the localhost parameter is set 
> to "_0.0.0.0_". 
> # Nodes start on different hosts. All the available host addresses are 
> resolved and
> # Among the other addresses, all nodes get the loopback address 
> "127.0.0.1:47500" (47500 is the default tcp discovery port).
> # Cluster starts and works. But 
> # Some node N (A) decides the connection to node N+1 (B) is lost and tries to 
> connect to node N+2 (C) and sends _TcpDiscoveryHandshakeRequest_.
> # Before C accepts incoming A's connection, it decides to check B and pings 
> it with _ServerImpl#checkConnection(List<InetSocketAddress> addrs, int 
> timeout)_
> # Around here, the network is restored, and A can now connect to B anew.
> # "_127.0.0.1:47500_" is last in _List<InetSocketAddress>_ addrs by 
> _IgniteUtils#inetAddressesComparator(boolean sameHost)_. But the connect 
> attempts in _checkConnection(...)_ are parallel. "_127.0.0.1:47500_" answers 
> first.
> # C sees it can connect to "_127.0.0.1:47500_" and chooses it as the alive 
> address of B. Other pings to rest of B's addresses are ignored.
> # But "_127.0.0.1:47500_" is one of C's addresses. C realizes it pinged 
> itself and decides that B is not reachable:
> {code:java}
>          // If local node was able to connect to previous, confirm that it's 
> alive.
>          ok = liveAddr != null && (!liveAddr.getAddress().isLoopbackAddress() 
> || !locNode.socketAddresses().contains(liveAddr));
> {code}
> # C accepts connection from A and answers with 
> _TcpDiscoveryHandshakeResponse#previousNodeAlive() == false_
> # But B is ok now. But A connects to C and B is kicked from the ring.
> The problem is that C ping itself by B's address "_127.0.0.1:47500_"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to