[ 
https://issues.apache.org/jira/browse/IGNITE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755861#comment-16755861
 ] 

Alexey Goncharuk commented on IGNITE-10935:
-------------------------------------------

Several issues were discovered and fixed in the attached PR:
1) Pending messages were incorrectly initialized during processing of 
NodeAddedMessage. Non-null discardId caused the SkipIterator to skip all 
pending messages immediately after join
2) Collection of failed nodes were not set to pending messages, causing new 
coordinator to skip correct NodeAddedMessage processing
3) A node could skip second NodeAddedMessage processing if local node order was 
greater than in received message
4) HandshakeRequest did not check which node was responding for the request, 
and receiving node did not check previous node ID
5) When a node decides to segment itself in CONNECTING state, it failed to do 
so causing a zombie node in a ring
6) Promotion of the local node into the first coordinator is done in a 
not-thread-safe way with regard to ring message worker

> "Invalid node order" error occurs while cycle cluster nodes restart
> -------------------------------------------------------------------
>
>                 Key: IGNITE-10935
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10935
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Dmitry Sherstobitov
>            Assignee: Alexey Goncharuk
>            Priority: Critical
>             Fix For: 2.8
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878
> {code:java}
> Exception in thread "tcp-disco-msg-worker-#2" java.lang.AssertionError: 
> Invalid node order: TcpDiscoveryNode 
> [id=9a332aa3-3d60-469a-9ff5-3deee8918451, addrs=[0:0:0:0:0:0:0:1%lo, 
> 127.0.0.1, 172.17.0.1, 172.25.1.40], sockAddrs=[/172.25.1.40:47501, 
> /0:0:0:0:0:0:0:1%lo:47501, /127.0.0.1:47501, /172.17.0.1:47501], 
> discPort=47501, order=0, intOrder=16, lastExchangeTime=1547486771047, 
> loc=false, ver=2.4.13#20190114-sha1:a7667ae6, isClient=false]
> at 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:51)
> at 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:48)
> at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2030)
> at 
> org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9635)
> at 
> org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9608)
> at 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:625)
> at 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleNodes(TcpDiscoveryNodesRing.java:145)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.notifyDiscovery(ServerImpl.java:1429)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.access$2400(ServerImpl.java:176)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:4565)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2732)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2554)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6955)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2634)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> Collaps{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to