[ https://issues.apache.org/jira/browse/IGNITE-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Steshin reassigned IGNITE-13663: ----------------------------------------- Assignee: Denis A. Magda (was: Vladimir Steshin) > Represent in the documenttion affection of several node addresses on failure > detection v2. > ------------------------------------------------------------------------------------------ > > Key: IGNITE-13663 > URL: https://issues.apache.org/jira/browse/IGNITE-13663 > Project: Ignite > Issue Type: Improvement > Components: documentation > Affects Versions: 2.7.6, 2.9, 2.8.1 > Reporter: Vladimir Steshin > Assignee: Denis A. Magda > Priority: Major > Labels: iep-45 > Fix For: 2.10 > > Time Spent: 10m > Remaining Estimate: 0h > > We should document that TcpDiscoverySpi prolongs detection of node failure if > node has several addresses. > By default, all available addresses are assigned to node and node listens any > address (0.0.0.0). Not first non-loopback addresses as the documentation > says. Simple example on my ordinary Mac having WiFi, VPN and docker (from > Ignite log): `Local node addresses: [192.168.1.42/0:0:0:0:0:0:0:1%lo0, > /127.0.0.1, /192.168.1.42]`. > It is cleary seen that `ServerImpl.TcpServer.srvrSock` binds to '0.0.0.0'. > And actual failure detection and connection restoring delay is: > `failureDetectionTimeout * addresses_number + connRecoveryTimeout`. Which is > usually unexpectable. This peculiarity was unearthed in [1], [2] and > additionally confirmed in ducktape integration test [3]. > To avoid this, user should assign `IgniteConfiguration.localHost` or > `TcpDiscoverySpi.localAddress`. Unfortunately, users frequently skip this > setting and allow node to activate all available IPs. > Often, middleware runs in environments with several IP addresses > (virtualizations, containers, different networks). Node sends all obtained > addresses with other node info to the cluster. Connection to node is > established to first of its addresses. But if lost, other addresses are > attempted to reconnect sequentially. If addresses do not belong to assumed > node network, do not represent existing physical connection, processing them > is just waste of time. > [1] https://issues.apache.org/jira/browse/IGNITE-13012 > [2] https://issues.apache.org/jira/browse/IGNITE-13134 > [3] > https://github.com/apache/ignite/blob/ignite-ducktape/modules/ducktests/tests/ignitetest/tests/discovery_test.py -- This message was sent by Atlassian Jira (v8.3.4#803005)