[
https://issues.apache.org/jira/browse/IGNITE-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542798#comment-17542798
]
Yurii commented on IGNITE-15996:
--------------------------------
Same issue with dokerized ignite cluster on linux with containers configuration
`NetworkMode: host`
OS: `Ubuntu 20.04.3`, docker version: `20.10.10`, ignite image:
`apacheignite/ignite:2.8.1`
My static spi configuration:
{code:java}
// xml
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>10.107.0.70:47500..47510</value>
<value>10.107.0.218:47500..47510</value>
</list>
</property>
</bean>
</property>{code}
I provision cluster with ansible so when nodes start in parallel - cluster has
2 server nodes and it's OK:
{code:java}
// shell
visor> top
Hosts: 4
+============================================================================================================================================================+
| Int./Ext. IPs | Node ID8(@) | Node consistent ID |
Node Type | OS | CPUs | MACs | CPU Load |
+============================================================================================================================================================+
| 10.107.0.70 | 1: C7675661(@n0) | 10.107.0.70,127.0.0.1,172.17.0.1:47500 |
Server | Linux amd64 5.11.0-1020-aws | 4 | 02:42:01:27:D9:54 | 0.33 % |
| 127.0.0.1 | | |
| | | 06:F6:EE:EA:D4:78 | |
| 172.17.0.1 | | |
| | | | |
+---------------+------------------+-----------------------------------------+-----------+-----------------------------+------+-------------------+----------+
| 10.107.0.218 | 1: 1EBB412F(@n1) | 10.107.0.218,127.0.0.1,172.17.0.1:47500 |
Server | Linux amd64 5.11.0-1020-aws | 4 | 02:42:D5:BF:B2:42 | 0.40 % |
| 127.0.0.1 | | |
| | | 06:86:27:0D:5D:7C | |
| 172.17.0.1 | | |
| | | | |
+---------------+------------------+-----------------------------------------+-----------+-----------------------------+------+-------------------+----------+{code}
But if I restart one node from cluster - it fails to join with following
exception:
{code:java}
// shell
[06:25:08,581][SEVERE][main][IgniteKernal] Failed to start manager:
GridManagerAdapter [enabled=true,
name=o.a.i.i.managers.discovery.GridDiscoveryManager]
class org.apache.ignite.IgniteCheckedException: Failed to start SPI:
TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.Marshal
lerUtils$1@7d070ef5], reconCnt=10, reconDelay=2000, maxAckTimeout=600000,
soLinger=5, forceSrvMode=false, clientReconnectDisabled=false,
internalLsnr=null, skipAddrsRandomization=false]
at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302)
at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:943)
at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1960)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1276)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2045)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1703)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1117)
at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1035)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:921)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:820)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:659)
at org.apache.ignite.Ignition.start(Ignition.java:346)
at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same
ID was found in node IDs history or existing node in topology has the same ID
(fix configuration and restart local node) [loca
lNode=TcpDiscoveryNode [id=0981cd22-4616-43a5-bccd-3e28762247fd,
consistentId=10.107.0.70,127.0.0.1,172.17.0.1:47500, addrs=ArrayList
[10.107.0.70, 127.0.0.1, 172.17.0.1], sockAddrs=HashSet [ip-172-17-0-1
.eu-west-2.compute.internal/172.17.0.1:47500, /10.107.0.70:47500,
/127.0.0.1:47500], discPort=47500, order=0, intOrder=0,
lastExchangeTime=1653632688431, loc=true, ver=2.8.1#20200521-sha1:86422096,
isClie
nt=false], existingNode=0981cd22-4616-43a5-bccd-3e28762247fd]
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.duplicateIdError(TcpDiscoverySpi.java:1975)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1112)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:427)
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2099)
at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299)
... 13 more
[06:25:08,582][SEVERE][main][IgniteKernal] Got exception while starting (will
rollback startup routine).{code}
If I restart both nodes simultaneously - they will start and connect to each
other again.
Same situation when I try to add one more node.
So it seems ignite in docker with `NetworkMode: host` uses default docker
bridge `docker0: 172.17.0.1` as advertise local address, which for sure should
be unique. For some reason it works if all cluster nodes starting/restarting
simultaneously - but this is a bad workaround when I'm adding another node in
production cluster on live or need to change some configuration property and
restart nodes gradually.
> Node fails with "Node with the same ID was found" while connecting to the
> cluster in Docker container if previous container was stopped
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-15996
> URL: https://issues.apache.org/jira/browse/IGNITE-15996
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.10
> Environment: Windows 10, Docker+WSL2
> Reporter: Ksenia Rybakova
> Priority: Major
> Attachments: ignite-47b5227b.0.log, ignite-c072978e.0.log,
> ignite-c62bc58e.0.log
>
>
> Node in Docker container fails to connect to existing cluster if previously
> connected node (container) was stopped:
> {noformat}
> [11:27:38,272][SEVERE][main][IgniteKernal] Got exception while starting (will
> rollback startup routine).
> class org.apache.ignite.IgniteCheckedException: Failed to start manager:
> GridManagerAdapter [enabled=true,
> name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
> at
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1990)
> at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1331)
> at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2141)
> at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1787)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1172)
> at
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1066)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:952)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:851)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:721)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
> at org.apache.ignite.Ignition.start(Ignition.java:353)
> at
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:367)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
> SPI: TcpDiscoverySpi [addrRslvr=null, addressFilter=null, sockTimeout=5000,
> ackTimeout=5000, marsh=JdkMarshaller
> [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@21f9277b],
> reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=0,
> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null,
> skipAddrsRandomization=false]
> at
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:281)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:980)
> at
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1985)
> ... 11 more
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same
> ID was found in node IDs history or existing node in topology has the same ID
> (fix configuration and restart local node) [localNode=TcpDiscoveryNode
> [id=c62bc58e-102a-4928-8e54-ac8a56bf4d44,
> consistentId=127.0.0.1,172.17.0.4:47500, addrs=ArrayList [127.0.0.1,
> 172.17.0.4], sockAddrs=HashSet [402b337a50dd/172.17.0.4:47500,
> /127.0.0.1:47500], discPort=47500, order=0, intOrder=3,
> lastExchangeTime=1637839658247, loc=true, ver=2.11.0#20210911-sha1:8f3f07d3,
> isClient=false], existingNode=c62bc58e-102a-4928-8e54-ac8a56bf4d44]
> at
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.duplicateIdError(TcpDiscoverySpi.java:2083)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1201)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:473)
> at
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2207)
> at
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278)
> ... 13 more{noformat}
> Steps to reproduce:
> 1) Download ignite Docker image
> {code:java}
> docker pull apacheignite/ignite:2.11.0{code}
> 2) Start node 1 (local directory is mounted to save logs)
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w1:/opt/ignite/apache-ignite/work
> apacheignite/ignite:2.11.0
> c5219b095c93ec56731eec9fa871ffb722ddead987256198d76889f4a1a8ea3e{code}
> 3) Start node 2
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w2:/opt/ignite/apache-ignite/work
> apacheignite/ignite:2.11.0
> 65fdae68a40b2d3d17ab7e560320ef6757713d8efacbc25a26aecca03be6f975{code}
> 4) Stop container for node 2
> {code:java}
> docker stop 65fdae68a40b{code}
> 5) Start node 3
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w3:/opt/ignite/apache-ignite/work
> apacheignite/ignite:2.11.0{code}
> Expected: node 3 joins the cluster successfully
> Actual: node 3 fails with "IgniteSpiException: Node with the same ID was
> found in node IDs history or existing node in topology has the same ID."
> while id seems unique.
> Logs are attached:
> node 1 - ignite-47b5227b.0.log,
> node 2 - ignite-c072978e.0.log,
> node 3 - ignite-c62bc58e.0.log.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)