[
https://issues.apache.org/jira/browse/IGNITE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ilya Kasnacheev updated IGNITE-8633:
------------------------------------
Attachment: hanging-node.log
baseline-node.log
> Node fails to bail out of wrong BLT, instead hanging around indefinitely
> ------------------------------------------------------------------------
>
> Key: IGNITE-8633
> URL: https://issues.apache.org/jira/browse/IGNITE-8633
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.4
> Reporter: Ilya Kasnacheev
> Assignee: Ilya Kasnacheev
> Priority: Major
> Attachments: 8633.zip, baseline-node.log, hanging-node.log
>
>
> Follow-up on
> https://stackoverflow.com/questions/50234056/how-to-give-multiple-static-ip-in-apache-ignite-cache-configuration-xml-file/50270676?noredirect=1#comment88095814_50270676
> but not quite the same.
> I have three nodes: A, B and C.
> I've started A and C and performed activation.
> Then I stopped them both, started B and performed activation on it.
> Now I have two BlT clusters: (A, C) and (B)
> However, when I start B; and then try to launch nodes A or C I get
> inconsistent behavior:
> When I launch C, I get the error:
> {code}
> org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node
> (8c1e210f-52bb-424f-9c7c-a2e7b1bab546 ) is not compatible with
> BaselineTopology in the cluster. Branching history of cluster BlT
> ([-1349069127]) doesn't contain branching point hash of joining node BlT
> (631694798). Consider cleaning persistent storage of the node and adding it
> to the cluster again.
> {code}
> But when I launch A, it never enters topology, but also never fails.
> Moreover, A and B will ping pong each other for eternity:
> {code}
> [20:16:38,596][WARNING][main][TcpDiscoverySpi] Node has not been connected to
> topology and will repeat join process. Check remote nodes logs for possible
> error messages. Note that large topology may require significant time to
> start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if
> getting this message on the starting nodes [networkTimeout=5000]
> [20:17:29,514][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
> accepted incoming connection [rmtAddr=/172.25.1.36, rmtPort=49030]
> [20:17:29,522][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
> spawning a new thread for connection [rmtAddr=/172.25.1.36, rmtPort=49030]
> [20:17:29,523][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Started
> serving remote node connection [rmtAddr=/172.25.1.36:49030, rmtPort=49030]
> [20:17:29,524][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Received
> ping request from the remote node
> [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, rmtAddr=/172.25.1.36:49030,
> rmtPort=49030]
> [20:17:29,525][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Finished
> writing ping response [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc,
> rmtAddr=/172.25.1.36:49030, rmtPort=49030]
> [20:17:29,526][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Finished
> serving remote node connection [rmtAddr=/172.25.1.36:49030, rmtPort=49030
> [20:18:30,733][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
> accepted incoming connection [rmtAddr=/172.25.1.36, rmtPort=50857]
> [20:18:30,733][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
> spawning a new thread for connection [rmtAddr=/172.25.1.36, rmtPort=50857]
> [20:18:30,733][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Started
> serving remote node connection [rmtAddr=/172.25.1.36:50857, rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Received
> ping request from the remote node
> [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, rmtAddr=/172.25.1.36:50857,
> rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Finished
> writing ping response [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc,
> rmtAddr=/172.25.1.36:50857, rmtPort=50857]
> [20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Finished
> serving remote node connection [rmtAddr=/172.25.1.36:50857, rmtPort=50857
> {code}
> {code}
> [20:16:28,793][INFO][tcp-disco-msg-worker-#3][GridSnapshotAwareClusterStateProcessorImpl]
> Received state change finish message: true
> [20:16:28,803][INFO][exchange-worker-#62][time] Finished exchange init
> [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], crd=true]
> [20:16:28,812][INFO][exchange-worker-#62][GridCachePartitionExchangeManager]
> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
> [topVer=1, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT,
> node=37104137-a21e-4b6f-a70b-09164300bbfc]
> [20:16:28,818][INFO][sys-#68][GridSnapshotAwareClusterStateProcessorImpl]
> Successfully performed final activation steps
> [nodeId=37104137-a21e-4b6f-a70b-09164300bbfc, client=false,
> topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1]]
> [20:16:33,571][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
> accepted incoming connection [rmtAddr=/172.25.1.35, rmtPort=42500]
> [20:16:33,579][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
> spawning a new thread for connection [rmtAddr=/172.25.1.35, rmtPort=42500]
> [20:16:33,580][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Started
> serving remote node connection [rmtAddr=/172.25.1.35:42500, rmtPort=42500]
> [20:16:33,592][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished
> serving remote node connection [rmtAddr=/172.25.1.35:42500, rmtPort=42500
> [20:16:39,801][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
> accepted incoming connection [rmtAddr=/172.25.1.35, rmtPort=42714]
> [20:16:39,801][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
> spawning a new thread for connection [rmtAddr=/172.25.1.35, rmtPort=42714]
> [20:16:39,802][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Started
> serving remote node connection [rmtAddr=/172.25.1.35:42714, rmtPort=42714]
> [20:16:39,806][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Finished
> serving remote node connection [rmtAddr=/172.25.1.35:42714, rmtPort=42714
> {code}
> I don't think this is expected behaviour. I will attach config and work
> directories.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)