The way I "solved" this problem was to modify both
to make sure the external IP addresses (the ones in ATTR_EXT_ADDRS attribute of
the cluster node) are listed first in the returned collection.
It did fix the problem and significantly reduced the connection time as Ignite
no longer had to waste time attempting to connect to the remote node's Docker's
internal IP. It always results in a socket timeout (2 seconds, by default), and
in case of multiple nodes, making the cluster startup very slow and unreliable.
Of course, having a Docker Swarm with an overlay network would probably solve
this problem more elegantly without any code changes, but I'm not a Docker
expert and Docker Swarm is not my target execution environment anyway. I'd like
to be able to deploy Ignite nodes in standalone containers and have them join
the cluster as if they were running on physical hardware.
Hope it helps.
From: Sergey Chugunov <sergey.chugu...@gmail.com>
Sent: Friday, February 9, 2018 3:54 AM
Subject: TcpCommunicationSpi in dockerized environment
Hello Ignite community,
When testing Ignite in dockerized environment I faced the following issue
with current TcpComminicationSpi implementation.
I had several physical machines and each Ignite node running inside Docker
container had at least two InetAddresses associated with it: one IP address
associated with physical host and one additional IP address of Docker
bridge interface *which was default and the same accross all physical
Each node publishes address of its Docker bridge in the list of its
addresses although it is not reachable from remote nodes.
So when node tries to establish communication connection using remote
node's Docker address its request goes to itself like it was a loopback
I would suggest to implement a simple heuristic to avoid this: before
connecting to some remote node's address CommunicationSpi should check
whether local node has exactly the same address. If "remote" and local
addresses are the same CommunicationSpi should skip such address from
remote node's list and proceed with the next one.
Is it safe to implement such heuristic in TcpCommunicationSpi or there are
some risks I'm missing? I would really appreciate any help from expert with
deep knowledge of Communication mechanics.
If such improvement makes sense I'll file a ticket and start working on it.