Alexandr Kuramshin created IGNITE-6491:
------------------------------------------

             Summary: Race in TopologyValidator.validate() and EVT_NODE_LEFT 
listener calls (split-brain activator)
                 Key: IGNITE-6491
                 URL: https://issues.apache.org/jira/browse/IGNITE-6491
             Project: Ignite
          Issue Type: Bug
          Components: cache, general
    Affects Versions: 2.1
            Reporter: Alexandr Kuramshin
            Assignee: Alexandr Kuramshin
             Fix For: 2.2


The following wrong cache {{validate}}/{{put}} sequence may occur

On node left {{GridDhtPartitionsExchangeFuture}} will be generated by the 
{{disco-event-worker}} thread.

Then the {{exchange-worker}} thread does

{noformat}
Split-brain detected [cacheName=test40, activatorTopVer=0, cacheTopVer=14]
        at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
        at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator.validate(IgniteTopologyValidatorGridSplitCacheTest.java:307)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCacheGroup(GridDhtTopologyFutureAdapter.java:64)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1456)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:115)
        at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:668)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2278)
{noformat}

The result of validation is stored in {{grpValidRes}} with value of {{false}}.

After some delay the {{disco-event-worker}} thread will do

{noformat}
java.lang.Exception: Node is segment activator [cacheName=test40, 
activatorTopVer=14]
        at 
org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
        at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:360)
        at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:349)
        at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1463)
        at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:859)
        at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:844)
        at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341)
        at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2478)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2684)
        at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2507)
{noformat}

After this invocation the result of {{SplitAwareTopologyValidator.validate}} 
should be changed to {{true}}, but it was already invoked and the result has 
been cached in {{grpValidRes}} with the value of {{false}}.

So any successive calls to {{cache.put}} causes to fail

{noformat}
Test failed.
java.lang.RuntimeException: tryPut() failed 
[gridName=cache.IgniteTopologyValidatorGridSplitCacheTest0]
        at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.tryPut(IgniteTopologyValidatorGridSplitCacheTest.java:262)
        at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.testTopologyValidator(IgniteTopologyValidatorGridSplitCacheTest.java:182)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at junit.framework.TestCase.runTest(TestCase.java:176)
        at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
        at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
        at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
        at java.lang.Thread.run(Thread.java:748)
Caused by: javax.cache.CacheException: class 
org.apache.ignite.IgniteCheckedException: Failed to perform cache operation 
(cache topology is not valid): test40
        at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1327)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1672)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1032)
        at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:872)
        at 
org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.tryPut(IgniteTopologyValidatorGridSplitCacheTest.java:252)
        ... 10 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to perform 
cache operation (cache topology is not valid): test40
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:112)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:415)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1170)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:659)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2334)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2311)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1029)
        ... 12 more
{noformat}

The updated test {{IgniteTopologyValidatorGridSplitCacheTest}} fails frequently 
on my laptop with 8 nodes and 100 caches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to