[ 
https://issues.apache.org/jira/browse/IGNITE-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Steshin reassigned IGNITE-13663:
-----------------------------------------

    Assignee: Denis A. Magda  (was: Vladimir Steshin)

> Represent in the documenttion affection of several node addresses on failure 
> detection v2.
> ------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-13663
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13663
>             Project: Ignite
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 2.7.6, 2.9, 2.8.1
>            Reporter: Vladimir Steshin
>            Assignee: Denis A. Magda
>            Priority: Major
>              Labels: iep-45
>             Fix For: 2.10
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should document that TcpDiscoverySpi prolongs detection of node failure if 
> node has several addresses. 
> By default, all available addresses are assigned to node and node listens any 
> address (0.0.0.0). Not first non-loopback addresses as the documentation 
> says. Simple example on my ordinary Mac having WiFi, VPN and docker (from 
> Ignite log): `Local node addresses: [192.168.1.42/0:0:0:0:0:0:0:1%lo0, 
> /127.0.0.1, /192.168.1.42]`.
> It is cleary seen that `ServerImpl.TcpServer.srvrSock` binds to '0.0.0.0'.
> And actual failure detection and connection restoring delay is: 
> `failureDetectionTimeout * addresses_number + connRecoveryTimeout`.  Which is 
> usually unexpectable. This peculiarity was unearthed in [1], [2] and 
> additionally confirmed in ducktape integration test [3].
> To avoid this, user should assign `IgniteConfiguration.localHost` or 
> `TcpDiscoverySpi.localAddress`. Unfortunately, users frequently skip this 
> setting and allow node to activate all available IPs.
> Often, middleware runs in environments with several IP addresses 
> (virtualizations, containers, different networks). Node sends all obtained 
> addresses with other node info to the cluster. Connection to node is 
> established to first of its addresses. But if lost, other addresses are 
> attempted to reconnect sequentially. If addresses do not belong to assumed 
> node network, do not represent existing physical connection, processing them 
> is just waste of time. 
> [1] https://issues.apache.org/jira/browse/IGNITE-13012
> [2] https://issues.apache.org/jira/browse/IGNITE-13134
> [3] 
> https://github.com/apache/ignite/blob/ignite-ducktape/modules/ducktests/tests/ignitetest/tests/discovery_test.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to