>> Does the issue reproduce in >> subsequent runs? Unfortunately no. We performed 30+ runs without "success".
>> I think we can add an assertion to >> GridDhtLocalPartition#destroy() method to check that reservations is 0 Ok, I will check and merge in case of success. Created the Issue to handle this [1]. [1] https://issues.apache.org/jira/browse/IGNITE-12524 On Thu, Jan 9, 2020 at 1:46 PM Alexey Goncharuk <[email protected]> wrote: > Hello Anton, > > Thanks for digging into this. The logic with checking the > reservations count seems fishy to me as well, so I have no objections with > the suggested change. This "if" statement does not answer why the partition > was being destroyed during the commit, though. Does the issue reproduce in > subsequent runs? > > The logic around reserve/release seems ok to me, however, the > eviction/renting code looks overly complicated, perhaps, there is a bug > somewhere there? I think we can add an assertion to > GridDhtLocalPartition#destroy() method to check that reservations is 0 when > this method is called (there is a check for EVICTED state already there) > > --AG > > чт, 9 янв. 2020 г. в 09:45, Anton Vinogradov <[email protected]>: > > > Folks, > > Yardstick run (opt-serial-put-get-1-backup) failed with interesting > > exception: > > Critical system error detected. Will be handled accordingly to configured > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > > failureCtx=FailureContext [type=CRITICAL_ERROR, err=class > > o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing a > > transaction has produced runtime exception]] > > class > > > org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: > > Committing a transaction has produced runtime exception > > at > > > > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800) > > at > > > > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:838) > > at > > > > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitRemoteTx(GridDistributedTxRemoteAdapter.java:893) > > at > > > > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:1452) > > at > > > > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxFinishRequest(IgniteTxHandler.java:1375) > > at > > > > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$600(IgniteTxHandler.java:123) > > at > > > > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:241) > > at > > > > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:239) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) > > at > > > > > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1843) > > at > > > > > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1468) > > at > > > > > org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229) > > at > > > > > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1365) > > at > > > > > org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:555) > > at > > > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.lang.IllegalStateException: Tree is being concurrently > > destroyed: tx-p-470##CacheData > > at > > > > > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.checkDestroyed(BPlusTree.java:1011) > > at > > > > > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1831) > > at > > > > > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1696) > > at > > > > > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1679) > > at > > > > > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:441) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4288) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4262) > > at > > > > > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1540) > > at > > > > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:675) > > ... 19 more > > > > It seems, BPlusTree was destroyed between > > GridDistributedTxRemoteAdapter.java:545 and > > GridDistributedTxRemoteAdapter.java:675 while partition was reserved. > > > > See the full log [1] for details. > > > > During investigation weird code was found: > > private void release0(int sizeChange) { > > while (true) { > > long state = this.state.get(); > > > > int reservations = getReservations(state); > > > > if (reservations == 0) // How can it be zero at release > > attempt? > > return; > > > > I've replaced this weird code with assertion [2] and checked at TeamCity > > twice, nothing failed. > > > > So, questions > > 1) Any Idea why we able to have zero reservations at release attempt? > > 2) Any objection to merging assertion instead of weird return to the > master > > branch? > > 3) Any Idea why the exception happens? > > > > [1] > > > > > https://gist.githubusercontent.com/anton-vinogradov/834fc63114a3e8d46b89ea4ccec8148b/raw/6438930c7fef119d0ad60df76d821fe7bd100c5e/gistfile1.txt > > [2] > > > > > https://gitbox.apache.org/repos/asf?p=ignite.git;a=commitdiff;h=b2c083564fb3b48ebe87042e0ed442dc0af3a74d > > >
