Aleksey Plekhanov created IGNITE-21630: ------------------------------------------
Summary: Cluster falls apart on topology change when DNS service is unavailable Key: IGNITE-21630 URL: https://issues.apache.org/jira/browse/IGNITE-21630 Project: Ignite Issue Type: Bug Reporter: Aleksey Plekhanov Assignee: Aleksey Plekhanov Requests to DNS service performed synchroniously by some critical discovery threads. Timeout for such requests can't be controlled by java code (see [https://bugs.openjdk.org/browse/JDK-6450279]). This leads to segmentation of nodes and falling apart cluster. For example, stack of {{tcp-disco-msg-worker}} thread with request to DNS service: {noformat} at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1330) at java.net.InetAddress.getAllByName0(InetAddress.java:1283) at java.net.InetAddress.getAllByName(InetAddress.java:1199) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at java.net.InetAddress.getByName(InetAddress.java:1077) at java.net.InetSocketAddress.<init>(InetSocketAddress.java:220) at org.apache.ignite.internal.util.IgniteUtils.createResolved(IgniteUtils.java:9829) at org.apache.ignite.internal.util.IgniteUtils.toSocketAddresses(IgniteUtils.java:9792) at org.apache.ignite.internal.util.IgniteUtils.toSocketAddresses(IgniteUtils.java:9770) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.socketAddresses(TcpDiscoveryNode.java:392) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.getNodeAddresses(TcpDiscoverySpi.java:1267) at org.apache.ignite.spi.discovery.tcp.ServerImpl.interruptPing(ServerImpl.java:985) at org.apache.ignite.spi.discovery.tcp.ServerImpl.access$6800(ServerImpl.java:206) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeLeftMessage(ServerImpl.java:5433) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3221) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2894) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)