Alexey Goncharuk created IGNITE-13101: -----------------------------------------
Summary: Metastore may leave uncompleted write futures during node stop Key: IGNITE-13101 URL: https://issues.apache.org/jira/browse/IGNITE-13101 Project: Ignite Issue Type: Bug Reporter: Alexey Goncharuk I've got the following thread-dump (only relevant parts are retained) during one of the teamcity runs: {code} "sys-#103862%baseline.IgniteStableBaselineBinObjFieldsQuerySelfTest0%" #107048 prio=5 os_prio=0 tid=0x00007fa2d8009800 nid=0x480d waiting on condition [0x00007fa1d1cdc000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at org.apache.ignite.internal.processors.metric.GridMetricManager.remove(GridMetricManager.java:411) at org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl.remove(CacheGroupMetricsImpl.java:497) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.cleanup(GridCacheProcessor.java:512) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopCacheGroup(GridCacheProcessor.java:2901) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopCacheGroup(GridCacheProcessor.java:2889) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2781) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2878) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2431) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3832) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:3608) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:3207) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$200(GridDhtPartitionsExchangeFuture.java:154) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2994) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2982) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:399) at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:354) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:2982) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1989) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.preprocessSingleMessage(GridCachePartitionExchangeManager.java:524) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1100(GridCachePartitionExchangeManager.java:182) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:407) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:389) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3715) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3694) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1847) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1472) at org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229) at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1367) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "main" #1 prio=5 os_prio=0 tid=0x00007fa43c010000 nid=0x26df waiting on condition [0x00007fa443e2a000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.writeLock(GridCacheIoManager.java:526) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onKernalStop0(GridCacheIoManager.java:556) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.onKernalStop(GridCacheSharedManagerAdapter.java:135) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:801) at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2581) at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2529) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2669) - locked <0x00000000f7adbd98> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2632) at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:339) at org.apache.ignite.Ignition.stop(Ignition.java:230) at org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1338) at org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1386) at org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1364) at org.apache.ignite.internal.processors.database.baseline.IgniteStableBaselineBinObjFieldsQuerySelfTest.afterTest(IgniteStableBaselineBinObjFieldsQuerySelfTest.java:66) at org.apache.ignite.testframework.junits.GridAbstractTest.cleanUpTestEnviroment(GridAbstractTest.java:723) at org.apache.ignite.testframework.junits.GridAbstractTest.runTest(GridAbstractTest.java:2284) at org.apache.ignite.testframework.junits.GridAbstractTest.access$600(GridAbstractTest.java:176) at org.apache.ignite.testframework.junits.GridAbstractTest$2.evaluate(GridAbstractTest.java:214) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.apache.ignite.testframework.junits.SystemPropertiesRule.lambda$methodStatement$1(SystemPropertiesRule.java:110) at org.apache.ignite.testframework.junits.SystemPropertiesRule$$Lambda$9/68866931.evaluate(Unknown Source) at org.apache.ignite.testframework.junits.DelegatingJUnitStatement.evaluate(DelegatingJUnitStatement.java:49) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.apache.ignite.testframework.junits.GridAbstractTest.evaluateInsideFixture(GridAbstractTest.java:2780) at org.apache.ignite.testframework.junits.GridAbstractTest.access$500(GridAbstractTest.java:176) at org.apache.ignite.testframework.junits.GridAbstractTest$BeforeFirstAndAfterLastTestRule$1.evaluate(GridAbstractTest.java:2760) at org.apache.ignite.testframework.junits.SystemPropertiesRule.lambda$classStatement$0(SystemPropertiesRule.java:94) at org.apache.ignite.testframework.junits.SystemPropertiesRule$$Lambda$7/1709225221.evaluate(Unknown Source) at org.apache.ignite.testframework.junits.DelegatingJUnitStatement.evaluate(DelegatingJUnitStatement.java:49) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:186) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:161) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:84) at org.apache.maven.plugin.surefire.InPluginVMSurefireStarter.runSuitesInProcess(InPluginVMSurefireStarter.java:87) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1144) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1002) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:848) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:956) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288) at org.apache.maven.cli.MavenCli.main(MavenCli.java:192) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347) {code} Looks like there is a way for the metastore to return a future that will never complete after discovery SPI is stopped (the main thread is in {{GridCacheIoManager.onKernalStop()}}, therefore discovery may already stopped processing messages). Reproduced this in Queries 1 suite, with {{IgniteStableBaselineBinObjFieldsQuerySelfTest}} -- This message was sent by Atlassian Jira (v8.3.4#803005)