Vyacheslav Koptilin created IGNITE-16789:
--------------------------------------------
Summary: Failure to dynamically create a new cache can be a cause
of NullPointerException/AssertionError in the discovery thread
Key: IGNITE-16789
URL: https://issues.apache.org/jira/browse/IGNITE-16789
Project: Ignite
Issue Type: Bug
Reporter: Vyacheslav Koptilin
Assignee: Vyacheslav Koptilin
Simultaneous creating and removing a cache with the same name may lead to the
following NullPointerException in the disco-notifier thread and this is the
reason for triggering FailureHandler.
{noformat}
[2022-04-04
14:22:41,571][ERROR][disco-notifier-worker-#36%cache.IgniteDynamicCacheStartFailTest0%][GridDiscoveryManager]
Exception in discovery notifier worker thread.
java.lang.AssertionError: Dynamic cache descriptor is missing
[cacheName=TestDynamicCache]
at
org.apache.ignite.internal.processors.cache.ClusterCachesInfo.onCacheChangeRequested(ClusterCachesInfo.java:570)
at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.onCustomEvent(GridCacheProcessor.java:4307)
at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:680)
at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.access$7500(GridDiscoveryManager.java:559)
at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$NotificationTask.run(GridDiscoveryManager.java:994)
at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2852)
at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2890)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.lang.Thread.run(Thread.java:748)
{noformat}
It looks like the issue is caused by the concurrent starting and stopping
caches with the same names.
The following scenario results in the AssertionError (in the case when
assertions are disabled it will lead to the mentioned NullPointerException):
- the user starts a new cache with the name "A"
- he DynamicCacheChangeRequest is sent over the cluster ring
- every node, that is received this message, updates its list of registered
cache descriptors (see
ClusterCachesInfo.onCacheChangeRequested(DynamicCacheChangeBatch,
AffinityTopologyVersion))
- a node initiates a new partition map exchange
- user tries to stop cache with the same name "A"
- new DynamicCacheChangeRequest is sent and, therefore it will clean up the
list of registered caches
- at this point, the previous exchange fails for some reason (PME that is
related to cache start)
- the DynamicCacheChangeFailureMessage is sent over the ring and tries to find
the required cache descriptor on every node which is already removed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)