[jira] [Created] (IGNITE-8633) Node fails to bail out of wrong BLT, instead hanging around indefinitely

Ilya Kasnacheev (JIRA) Mon, 28 May 2018 10:26:23 -0700

Ilya Kasnacheev created IGNITE-8633:
---------------------------------------


             Summary: Node fails to bail out of wrong BLT, instead hanging 
around indefinitely
                 Key: IGNITE-8633
                 URL: https://issues.apache.org/jira/browse/IGNITE-8633
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.4
            Reporter: Ilya Kasnacheev
            Assignee: Stanislav Lukyanov


Follow-up on 
https://stackoverflow.com/questions/50234056/how-to-give-multiple-static-ip-in-apache-ignite-cache-configuration-xml-file/50270676?noredirect=1#comment88095814_50270676
 but not quite the same.

I have three nodes: A, B and C.
I've started A and C and performed activation.
Then I stopped them both, started B and performed activation on it.
Now I have two BlT clusters: (A, C) and (B)
However, when I start B; and then try to launch nodes A or C I get inconsistent 
behavior:
When I launch C, I get the error:
{code}
org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node 
(8c1e210f-52bb-424f-9c7c-a2e7b1bab546 ) is not compatible with BaselineTopology 
in the cluster. Branching history of cluster BlT ([-1349069127]) doesn't 
contain branching point hash of joining node BlT (631694798). Consider cleaning 
persistent storage of the node and adding it to the cluster again.
{code}

But when I launch A, it never enters topology, but also never fails. Moreover, 
A and B will ping pong each other for eternity:
{code}
[20:16:38,596][WARNING][main][TcpDiscoverySpi] Node has not been connected to 
topology and will repeat join process. Check remote nodes logs for possible 
error messages. Note that large topology may require significant time to start. 
Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting 
this message on the starting nodes [networkTimeout=5000]
[20:17:29,514][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted 
incoming connection [rmtAddr=/172.25.1.36, rmtPort=49030]
[20:17:29,522][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning 
a new thread for connection [rmtAddr=/172.25.1.36, rmtPort=49030]
[20:17:29,523][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Started 
serving remote node connection [rmtAddr=/172.25.1.36:49030, rmtPort=49030]
[20:17:29,524][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Received ping 
request from the remote node [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, 
rmtAddr=/172.25.1.36:49030, rmtPort=49030]
[20:17:29,525][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Finished 
writing ping response [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, 
rmtAddr=/172.25.1.36:49030, rmtPort=49030]
[20:17:29,526][INFO][tcp-disco-sock-reader-#26][TcpDiscoverySpi] Finished 
serving remote node connection [rmtAddr=/172.25.1.36:49030, rmtPort=49030
[20:18:30,733][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted 
incoming connection [rmtAddr=/172.25.1.36, rmtPort=50857]
[20:18:30,733][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning 
a new thread for connection [rmtAddr=/172.25.1.36, rmtPort=50857]
[20:18:30,733][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Started 
serving remote node connection [rmtAddr=/172.25.1.36:50857, rmtPort=50857]
[20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Received ping 
request from the remote node [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, 
rmtAddr=/172.25.1.36:50857, rmtPort=50857]
[20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Finished 
writing ping response [rmtNodeId=37104137-a21e-4b6f-a70b-09164300bbfc, 
rmtAddr=/172.25.1.36:50857, rmtPort=50857]
[20:18:30,734][INFO][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Finished 
serving remote node connection [rmtAddr=/172.25.1.36:50857, rmtPort=50857
{code}
{code}
[20:16:28,793][INFO][tcp-disco-msg-worker-#3][GridSnapshotAwareClusterStateProcessorImpl]
 Received state change finish message: true
[20:16:28,803][INFO][exchange-worker-#62][time] Finished exchange init 
[topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1], crd=true]
[20:16:28,812][INFO][exchange-worker-#62][GridCachePartitionExchangeManager] 
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=1, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, 
node=37104137-a21e-4b6f-a70b-09164300bbfc]
[20:16:28,818][INFO][sys-#68][GridSnapshotAwareClusterStateProcessorImpl] 
Successfully performed final activation steps 
[nodeId=37104137-a21e-4b6f-a70b-09164300bbfc, client=false, 
topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1]]
[20:16:33,571][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted 
incoming connection [rmtAddr=/172.25.1.35, rmtPort=42500]
[20:16:33,579][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning 
a new thread for connection [rmtAddr=/172.25.1.35, rmtPort=42500]
[20:16:33,580][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Started serving 
remote node connection [rmtAddr=/172.25.1.35:42500, rmtPort=42500]
[20:16:33,592][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished 
serving remote node connection [rmtAddr=/172.25.1.35:42500, rmtPort=42500
[20:16:39,801][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted 
incoming connection [rmtAddr=/172.25.1.35, rmtPort=42714]
[20:16:39,801][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning 
a new thread for connection [rmtAddr=/172.25.1.35, rmtPort=42714]
[20:16:39,802][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Started 
serving remote node connection [rmtAddr=/172.25.1.35:42714, rmtPort=42714]
[20:16:39,806][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Finished 
serving remote node connection [rmtAddr=/172.25.1.35:42714, rmtPort=42714
{code}

I don't think this is expected behaviour. I will attach config and work 
directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-8633) Node fails to bail out of wrong BLT, instead hanging around indefinitely

Reply via email to