Hello Anton,

Thanks for digging into this. The logic with checking the
reservations count seems fishy to me as well, so I have no objections with
the suggested change. This "if" statement does not answer why the partition
was being destroyed during the commit, though. Does the issue reproduce in
subsequent runs?

The logic around reserve/release seems ok to me, however, the
eviction/renting code looks overly complicated, perhaps, there is a bug
somewhere there? I think we can add an assertion to
GridDhtLocalPartition#destroy() method to check that reservations is 0 when
this method is called (there is a check for EVICTED state already there)

--AG

чт, 9 янв. 2020 г. в 09:45, Anton Vinogradov <a...@apache.org>:

> Folks,
> Yardstick run (opt-serial-put-get-1-backup) failed with interesting
> exception:
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing a
> transaction has produced runtime exception]]
> class
> org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException:
> Committing a transaction has produced runtime exception
> at
>
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800)
> at
>
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:838)
> at
>
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitRemoteTx(GridDistributedTxRemoteAdapter.java:893)
> at
>
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:1452)
> at
>
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxFinishRequest(IgniteTxHandler.java:1375)
> at
>
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$600(IgniteTxHandler.java:123)
> at
>
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:241)
> at
>
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:239)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
> at
>
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1843)
> at
>
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1468)
> at
>
> org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229)
> at
>
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1365)
> at
>
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:555)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Tree is being concurrently
> destroyed: tx-p-470##CacheData
> at
>
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.checkDestroyed(BPlusTree.java:1011)
> at
>
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1831)
> at
>
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1696)
> at
>
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1679)
> at
>
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:441)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4288)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4262)
> at
>
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1540)
> at
>
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:675)
> ... 19 more
>
> It seems, BPlusTree was destroyed between
> GridDistributedTxRemoteAdapter.java:545 and
> GridDistributedTxRemoteAdapter.java:675 while partition was reserved.
>
> See the full log [1] for details.
>
> During investigation weird code was found:
> private void release0(int sizeChange) {
>         while (true) {
>             long state = this.state.get();
>
>             int reservations = getReservations(state);
>
>             if (reservations == 0) // How can it be zero at release
> attempt?
>                 return;
>
> I've replaced this weird code with assertion [2] and checked at TeamCity
> twice, nothing failed.
>
> So, questions
> 1) Any Idea why we able to have zero reservations at release attempt?
> 2) Any objection to merging assertion instead of weird return to the master
> branch?
> 3) Any Idea why the exception happens?
>
> [1]
>
> https://gist.githubusercontent.com/anton-vinogradov/834fc63114a3e8d46b89ea4ccec8148b/raw/6438930c7fef119d0ad60df76d821fe7bd100c5e/gistfile1.txt
> [2]
>
> https://gitbox.apache.org/repos/asf?p=ignite.git;a=commitdiff;h=b2c083564fb3b48ebe87042e0ed442dc0af3a74d
>

Reply via email to