Ivan Daschinskiy created IGNITE-11364:
-----------------------------------------

             Summary: Segmenting node can cause ring topology broke
                 Key: IGNITE-11364
                 URL: https://issues.apache.org/jira/browse/IGNITE-11364
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.7, 2.6, 2.5
            Reporter: Ivan Daschinskiy
             Fix For: 2.8


While segmenting by partial network drop, i.e. by applying iptables rules, can 
cause ring broke.
Scenario:
On each machine there are two nodes, client and server respectivelly.

Lets draw diagram (only server nodes for brevity, they have been started before 
clients).

=> grid915 => ....... => grid947 => grid945 => grid703 => ..skip 12 nodes...=> 
grid952 => grid946.
On grid945 machine we drop incoming/outgoing connections by iptables.

During ongoing drop of connection, grid945 send TcpDiscoveryStatusCheckMessage, 
but cannot send them to grid703 and others mentioned above 12 nodes, but some 
next node accepted it with collection of failedNodes (13 nodes above). This 
message was received by grid947 and it skip these 13 nodes in 
org.apache.ignite.spi.discovery.tcp.ServerImpl.RingMessageWorker#sendMessageAcrossRing.

So we see this situation in topology:

.................. => grid947 => grid952
                                         ^ 
                                        //
grid703=>................=>grid662

These nodes are not considere by topology as failed.


 









--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to