[
https://issues.apache.org/jira/browse/IGNITE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755861#comment-16755861
]
Alexey Goncharuk commented on IGNITE-10935:
-------------------------------------------
Several issues were discovered and fixed in the attached PR:
1) Pending messages were incorrectly initialized during processing of
NodeAddedMessage. Non-null discardId caused the SkipIterator to skip all
pending messages immediately after join
2) Collection of failed nodes were not set to pending messages, causing new
coordinator to skip correct NodeAddedMessage processing
3) A node could skip second NodeAddedMessage processing if local node order was
greater than in received message
4) HandshakeRequest did not check which node was responding for the request,
and receiving node did not check previous node ID
5) When a node decides to segment itself in CONNECTING state, it failed to do
so causing a zombie node in a ring
6) Promotion of the local node into the first coordinator is done in a
not-thread-safe way with regard to ring message worker
> "Invalid node order" error occurs while cycle cluster nodes restart
> -------------------------------------------------------------------
>
> Key: IGNITE-10935
> URL: https://issues.apache.org/jira/browse/IGNITE-10935
> Project: Ignite
> Issue Type: Bug
> Reporter: Dmitry Sherstobitov
> Assignee: Alexey Goncharuk
> Priority: Critical
> Fix For: 2.8
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878
> {code:java}
> Exception in thread "tcp-disco-msg-worker-#2" java.lang.AssertionError:
> Invalid node order: TcpDiscoveryNode
> [id=9a332aa3-3d60-469a-9ff5-3deee8918451, addrs=[0:0:0:0:0:0:0:1%lo,
> 127.0.0.1, 172.17.0.1, 172.25.1.40], sockAddrs=[/172.25.1.40:47501,
> /0:0:0:0:0:0:0:1%lo:47501, /127.0.0.1:47501, /172.17.0.1:47501],
> discPort=47501, order=0, intOrder=16, lastExchangeTime=1547486771047,
> loc=false, ver=2.4.13#20190114-sha1:a7667ae6, isClient=false]
> at
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:51)
> at
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:48)
> at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2030)
> at
> org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9635)
> at
> org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9608)
> at
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:625)
> at
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleNodes(TcpDiscoveryNodesRing.java:145)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl.notifyDiscovery(ServerImpl.java:1429)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl.access$2400(ServerImpl.java:176)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:4565)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2732)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2554)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6955)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2634)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> Collaps{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)