Hello Anton, Thanks for digging into this. The logic with checking the reservations count seems fishy to me as well, so I have no objections with the suggested change. This "if" statement does not answer why the partition was being destroyed during the commit, though. Does the issue reproduce in subsequent runs?
The logic around reserve/release seems ok to me, however, the eviction/renting code looks overly complicated, perhaps, there is a bug somewhere there? I think we can add an assertion to GridDhtLocalPartition#destroy() method to check that reservations is 0 when this method is called (there is a check for EVICTED state already there) --AG чт, 9 янв. 2020 г. в 09:45, Anton Vinogradov <a...@apache.org>: > Folks, > Yardstick run (opt-serial-put-get-1-backup) failed with interesting > exception: > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=CRITICAL_ERROR, err=class > o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing a > transaction has produced runtime exception]] > class > org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: > Committing a transaction has produced runtime exception > at > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800) > at > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:838) > at > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitRemoteTx(GridDistributedTxRemoteAdapter.java:893) > at > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:1452) > at > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxFinishRequest(IgniteTxHandler.java:1375) > at > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$600(IgniteTxHandler.java:123) > at > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:241) > at > > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:239) > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1843) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1468) > at > > org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229) > at > > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1365) > at > > org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:555) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Tree is being concurrently > destroyed: tx-p-470##CacheData > at > > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.checkDestroyed(BPlusTree.java:1011) > at > > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1831) > at > > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1696) > at > > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1679) > at > > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:441) > at > > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4288) > at > > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4262) > at > > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1540) > at > > org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:675) > ... 19 more > > It seems, BPlusTree was destroyed between > GridDistributedTxRemoteAdapter.java:545 and > GridDistributedTxRemoteAdapter.java:675 while partition was reserved. > > See the full log [1] for details. > > During investigation weird code was found: > private void release0(int sizeChange) { > while (true) { > long state = this.state.get(); > > int reservations = getReservations(state); > > if (reservations == 0) // How can it be zero at release > attempt? > return; > > I've replaced this weird code with assertion [2] and checked at TeamCity > twice, nothing failed. > > So, questions > 1) Any Idea why we able to have zero reservations at release attempt? > 2) Any objection to merging assertion instead of weird return to the master > branch? > 3) Any Idea why the exception happens? > > [1] > > https://gist.githubusercontent.com/anton-vinogradov/834fc63114a3e8d46b89ea4ccec8148b/raw/6438930c7fef119d0ad60df76d821fe7bd100c5e/gistfile1.txt > [2] > > https://gitbox.apache.org/repos/asf?p=ignite.git;a=commitdiff;h=b2c083564fb3b48ebe87042e0ed442dc0af3a74d >