[jira] [Created] (IGNITE-19410) Node failure in case multiple nodes join and leave a cluster simultaneously and security is enabled.

Mikhail Petrov (Jira) Wed, 03 May 2023 05:35:56 -0700

Mikhail Petrov created IGNITE-19410:
---------------------------------------


             Summary: Node failure in case multiple nodes  join and leave a 
cluster simultaneously and security is enabled.
                 Key: IGNITE-19410
                 URL: https://issues.apache.org/jira/browse/IGNITE-19410
             Project: Ignite
          Issue Type: Bug
            Reporter: Mikhail Petrov
         Attachments: NodeSecurityContextTest.java

The case when nodes with security enabled join and leave the cluster 
simultaneously can cause the joining nodes to fail with the following exception:


{code:java}
[2023-05-03T14:54:31,208][ERROR][disco-notifier-worker-#332%ignite.NodeSecurityContextTest2%][IgniteTestResources]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
err=java.lang.IllegalStateException: Failed to find security context for 
subject with given ID : 4725544a-f144-4486-a705-46b2ac200011]]
 java.lang.IllegalStateException: Failed to find security context for subject 
with given ID : 4725544a-f144-4486-a705-46b2ac200011
    at 
org.apache.ignite.internal.processors.security.IgniteSecurityProcessor.withContext(IgniteSecurityProcessor.java:164)
 ~[classes/:?]
    at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$3$SecurityAwareNotificationTask.run(GridDiscoveryManager.java:949)
 ~[classes/:?]
    at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2822)
 ~[classes/:?]
    at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2860)
 [classes/:?]
    at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) 
[classes/:?]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_351] {code}
Reproducer is attached.

Simplified steps that leads to the failure:

1. The client node sends an arbitrary discovery message which produces an 
acknowledgement message when it processed by the all cluster nodes .
2. The client node gracefully leaves the cluster.
3. The new node joins the cluster and receives a topology snapshot that does 
not include the left client node.
4. The new node receives an acknowledgment for the message from the step 1 and 
fails during its processing because message originator node is not listed in 
the current discovery cache or discovery cache history (see 
IgniteSecurityProcessor#withContext(java.util.UUID)) . This is because 
currently the GridDiscoveryManager#historicalNode method only aware of the 
topology history that occurs after a node has joined the cluster. The complete 
cluster topology history that exists at the time a new node joined the cluster 
is stored in GridDiscoveryManager#topHist and is not taken into account by the 
GridDiscoveryManager#historicalNode method.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-19410) Node failure in case multiple nodes join and leave a cluster simultaneously and security is enabled.

Reply via email to