Vyacheslav Koptilin created IGNITE-16789:
--------------------------------------------

             Summary: Failure to dynamically create a new cache can be a cause 
of NullPointerException/AssertionError in the discovery thread
                 Key: IGNITE-16789
                 URL: https://issues.apache.org/jira/browse/IGNITE-16789
             Project: Ignite
          Issue Type: Bug
            Reporter: Vyacheslav Koptilin
            Assignee: Vyacheslav Koptilin


Simultaneous creating and removing a cache with the same name may lead to the 
following NullPointerException in the disco-notifier thread and this is the 
reason for triggering FailureHandler.

{noformat}
[2022-04-04 
14:22:41,571][ERROR][disco-notifier-worker-#36%cache.IgniteDynamicCacheStartFailTest0%][GridDiscoveryManager]
 Exception in discovery notifier worker thread.
java.lang.AssertionError: Dynamic cache descriptor is missing 
[cacheName=TestDynamicCache]
        at 
org.apache.ignite.internal.processors.cache.ClusterCachesInfo.onCacheChangeRequested(ClusterCachesInfo.java:570)
        at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.onCustomEvent(GridCacheProcessor.java:4307)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:680)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.access$7500(GridDiscoveryManager.java:559)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$NotificationTask.run(GridDiscoveryManager.java:994)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2852)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2890)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

It looks like the issue is caused by the concurrent starting and stopping 
caches with the same names.
The following scenario results in the AssertionError (in the case when 
assertions are disabled it will lead to the mentioned NullPointerException):
 - the user starts a new cache with the name "A"
 - he DynamicCacheChangeRequest is sent over the cluster ring
 - every node, that is received this message, updates its list of registered 
cache descriptors (see 
ClusterCachesInfo.onCacheChangeRequested(DynamicCacheChangeBatch, 
AffinityTopologyVersion))
 - a node initiates a new partition map exchange
 - user tries to stop cache with the same name "A"
 - new DynamicCacheChangeRequest is sent and, therefore it will clean up the 
list of registered caches
 - at this point, the previous exchange fails for some reason (PME that is 
related to cache start)
 - the DynamicCacheChangeFailureMessage is sent over the ring and tries to find 
the required cache descriptor on every node which is already removed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to