[
https://issues.apache.org/jira/browse/IGNITE-11364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784505#comment-16784505
]
Alexey Goncharuk commented on IGNITE-11364:
-------------------------------------------
[~sergey-chugunov], looks good, thanks for unrolling this issue! Merged to
master.
> Segmenting node can cause ring topology broke
> ---------------------------------------------
>
> Key: IGNITE-11364
> URL: https://issues.apache.org/jira/browse/IGNITE-11364
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.5, 2.6, 2.7
> Reporter: Ivan Daschinskiy
> Assignee: Sergey Chugunov
> Priority: Blocker
> Fix For: 2.8
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> While segmenting by partial network drop, i.e. by applying iptables rules,
> can cause ring broke.
> Scenario:
> On each machine there are two nodes, client and server respectivelly.
> Lets draw diagram (only server nodes for brevity, they have been started
> before clients).
> => grid915 => ....... => grid947 => grid945 => grid703 => ..skip 12
> nodes...=> grid952 => grid946.
> On grid945 machine we drop incoming/outgoing connections by iptables.
> During ongoing drop of connection, grid945 send
> TcpDiscoveryStatusCheckMessage, but cannot send them to grid703 and others
> mentioned above 12 nodes, but some next node accepted it with collection of
> failedNodes (13 nodes above). This message was received by grid947 and it
> skip these 13 nodes in
> org.apache.ignite.spi.discovery.tcp.ServerImpl.RingMessageWorker#sendMessageAcrossRing.
> So we see this situation in topology:
> .................. => grid947 => grid952=>..............
> ........................................//
> grid703=>................=>grid662
> Mentioned above nodes are not considered by grid as failed, so we see
> topology as tail connected to ring.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)