[ 
https://issues.apache.org/jira/browse/IGNITE-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542798#comment-17542798
 ] 

Yurii commented on IGNITE-15996:
--------------------------------

Same issue with dokerized ignite cluster on linux with containers configuration 
`NetworkMode: host`

OS: `Ubuntu 20.04.3`, docker version: `20.10.10`, ignite image: 
`apacheignite/ignite:2.8.1`

My static spi configuration:

 
{code:java}
// xml
        <property name="discoverySpi">
          <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
            <property name="ipFinder">
              <bean 
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                <property name="addresses">
                  <list>
                    <value>10.107.0.70:47500..47510</value>
                    <value>10.107.0.218:47500..47510</value>
                  </list>
                </property>
              </bean>
            </property>{code}
I provision cluster with ansible so when nodes start in parallel - cluster has 
2 server nodes and it's OK:
{code:java}
// shell
visor> top
Hosts: 4
+============================================================================================================================================================+
| Int./Ext. IPs |   Node ID8(@)    |           Node consistent ID            | 
Node Type |             OS              | CPUs |       MACs        | CPU Load |
+============================================================================================================================================================+
| 10.107.0.70   | 1: C7675661(@n0) | 10.107.0.70,127.0.0.1,172.17.0.1:47500  | 
Server    | Linux amd64 5.11.0-1020-aws | 4    | 02:42:01:27:D9:54 | 0.33 %   |
| 127.0.0.1     |                  |                                         |  
         |                             |      | 06:F6:EE:EA:D4:78 |          |
| 172.17.0.1    |                  |                                         |  
         |                             |      |                   |          |
+---------------+------------------+-----------------------------------------+-----------+-----------------------------+------+-------------------+----------+
| 10.107.0.218  | 1: 1EBB412F(@n1) | 10.107.0.218,127.0.0.1,172.17.0.1:47500 | 
Server    | Linux amd64 5.11.0-1020-aws | 4    | 02:42:D5:BF:B2:42 | 0.40 %   |
| 127.0.0.1     |                  |                                         |  
         |                             |      | 06:86:27:0D:5D:7C |          |
| 172.17.0.1    |                  |                                         |  
         |                             |      |                   |          |
+---------------+------------------+-----------------------------------------+-----------+-----------------------------+------+-------------------+----------+{code}
But if I restart one node from cluster - it fails to join with following 
exception:
{code:java}
// shell

[06:25:08,581][SEVERE][main][IgniteKernal] Failed to start manager: 
GridManagerAdapter [enabled=true, 
name=o.a.i.i.managers.discovery.GridDiscoveryManager]
class org.apache.ignite.IgniteCheckedException: Failed to start SPI: 
TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, 
marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.Marshal
lerUtils$1@7d070ef5], reconCnt=10, reconDelay=2000, maxAckTimeout=600000, 
soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, 
internalLsnr=null, skipAddrsRandomization=false]
        at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:943)
        at 
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1960)
        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1276)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2045)
        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1703)
        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1117)
        at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1035)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:921)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:820)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:659)
        at org.apache.ignite.Ignition.start(Ignition.java:346)
        at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) [loca
lNode=TcpDiscoveryNode [id=0981cd22-4616-43a5-bccd-3e28762247fd, 
consistentId=10.107.0.70,127.0.0.1,172.17.0.1:47500, addrs=ArrayList 
[10.107.0.70, 127.0.0.1, 172.17.0.1], sockAddrs=HashSet [ip-172-17-0-1
.eu-west-2.compute.internal/172.17.0.1:47500, /10.107.0.70:47500, 
/127.0.0.1:47500], discPort=47500, order=0, intOrder=0, 
lastExchangeTime=1653632688431, loc=true, ver=2.8.1#20200521-sha1:86422096, 
isClie
nt=false], existingNode=0981cd22-4616-43a5-bccd-3e28762247fd]
        at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.duplicateIdError(TcpDiscoverySpi.java:1975)
        at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1112)
        at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:427)
        at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2099)
        at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299)
        ... 13 more
[06:25:08,582][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).{code}
If I restart both nodes simultaneously - they will start and connect to each 
other again.

Same situation when I try to add one more node.

 

So it seems ignite in docker with `NetworkMode: host` uses default docker 
bridge `docker0:  172.17.0.1` as advertise local address, which for sure should 
be unique. For some reason it works if all cluster nodes starting/restarting 
simultaneously - but this is a bad workaround when I'm adding another node in 
production cluster on live or need to change some configuration property and 
restart nodes gradually.

 

> Node fails with "Node with the same ID was found" while connecting to the 
> cluster in Docker container if previous container was stopped
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-15996
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15996
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.10
>         Environment: Windows 10, Docker+WSL2
>            Reporter: Ksenia Rybakova
>            Priority: Major
>         Attachments: ignite-47b5227b.0.log, ignite-c072978e.0.log, 
> ignite-c62bc58e.0.log
>
>
> Node in Docker container fails to connect to existing cluster if previously 
> connected node (container) was stopped:
> {noformat}
> [11:27:38,272][SEVERE][main][IgniteKernal] Got exception while starting (will 
> rollback startup routine).
> class org.apache.ignite.IgniteCheckedException: Failed to start manager: 
> GridManagerAdapter [enabled=true, 
> name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1990)
>     at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1331)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2141)
>     at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1787)
>     at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1172)
>     at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1066)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:952)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:851)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:721)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
>     at org.apache.ignite.Ignition.start(Ignition.java:353)
>     at 
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:367)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start 
> SPI: TcpDiscoverySpi [addrRslvr=null, addressFilter=null, sockTimeout=5000, 
> ackTimeout=5000, marsh=JdkMarshaller 
> [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@21f9277b], 
> reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=0, 
> forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, 
> skipAddrsRandomization=false]
>     at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:281)
>     at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:980)
>     at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1985)
>     ... 11 more
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) [localNode=TcpDiscoveryNode 
> [id=c62bc58e-102a-4928-8e54-ac8a56bf4d44, 
> consistentId=127.0.0.1,172.17.0.4:47500, addrs=ArrayList [127.0.0.1, 
> 172.17.0.4], sockAddrs=HashSet [402b337a50dd/172.17.0.4:47500, 
> /127.0.0.1:47500], discPort=47500, order=0, intOrder=3, 
> lastExchangeTime=1637839658247, loc=true, ver=2.11.0#20210911-sha1:8f3f07d3, 
> isClient=false], existingNode=c62bc58e-102a-4928-8e54-ac8a56bf4d44]
>     at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.duplicateIdError(TcpDiscoverySpi.java:2083)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1201)
>     at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:473)
>     at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2207)
>     at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278)
>     ... 13 more{noformat}
> Steps to reproduce:
> 1) Download ignite Docker image
> {code:java}
> docker pull apacheignite/ignite:2.11.0{code}
>  2) Start node 1 (local directory is mounted to save logs)
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w1:/opt/ignite/apache-ignite/work 
> apacheignite/ignite:2.11.0 
> c5219b095c93ec56731eec9fa871ffb722ddead987256198d76889f4a1a8ea3e{code}
> 3) Start node 2
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w2:/opt/ignite/apache-ignite/work 
> apacheignite/ignite:2.11.0 
> 65fdae68a40b2d3d17ab7e560320ef6757713d8efacbc25a26aecca03be6f975{code}
> 4) Stop container for node 2
> {code:java}
> docker stop 65fdae68a40b{code}
> 5) Start node 3
> {code:java}
> docker run -d -v ${PWD}/docker_ignite_w3:/opt/ignite/apache-ignite/work 
> apacheignite/ignite:2.11.0{code}
> Expected: node 3 joins the cluster successfully
> Actual: node 3 fails with "IgniteSpiException: Node with the same ID was 
> found in node IDs history or existing node in topology has the same ID." 
> while id seems unique. 
> Logs are attached:
> node 1 - ignite-47b5227b.0.log,
> node 2 - ignite-c072978e.0.log,
> node 3 - ignite-c62bc58e.0.log.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to