[
https://issues.apache.org/jira/browse/IGNITE-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luchnikov Alexander updated IGNITE-15990:
-----------------------------------------
Labels: ise.lts (was: )
> AssertionError in B+Tree under load
> -----------------------------------
>
> Key: IGNITE-15990
> URL: https://issues.apache.org/jira/browse/IGNITE-15990
> Project: Ignite
> Issue Type: Bug
> Components: data structures
> Reporter: Ivan Bessonov
> Assignee: Ivan Bessonov
> Priority: Major
> Labels: ise.lts
> Fix For: 2.13
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Following exception has occurred in GridGain CE 8.7.29:
> {code:java}
> class
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-830802122,
> val2=844420635165316]], cacheId=-112787658, cacheName=TABLEDSS,
> indexName=TABLEDSS_TTL_DESC_IDX, msg=Runtime failure on search row:
> Row@28da1b19[ ... ]
> at
> org.apache.ignite.internal.processors.query.h2.database.H2Tree.corruptedTreeException(H2Tree.java:927)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2103)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removex(BPlusTree.java:1899)
> at
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:515)
> at
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.remove(GridH2Table.java:822)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:420)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2952)
> at
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:437)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:2703)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1727)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1694)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2643)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:435)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2329)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2596)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2056)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1873)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1674)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:319)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:496)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:454)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:267)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.remove0(GridDhtAtomicCache.java:1192)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.remove0(GridDhtAtomicCache.java:728)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.remove(GridCacheAdapter.java:2858)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.remove(GridCacheAdapter.java:2835)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.remove(IgniteCacheProxyImpl.java:1431)
> at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.remove(GatewayProtectedCacheProxy.java:913)
> at
> org.apache.ignite.internal.processors.platform.cache.PlatformCache.processInStreamOutLong(PlatformCache.java:521)
> at
> org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutLong(PlatformTargetProxyImpl.java:66)
> at
> org.apache.ignite.internal.processors.platform.callback.PlatformCallbackUtils.inLongOutLong(Native
> Method)
> at
> org.apache.ignite.internal.processors.platform.callback.PlatformCallbackGateway.computeJobExecute(PlatformCallbackGateway.java:328)
> at
> org.apache.ignite.internal.processors.platform.compute.PlatformClosureJob.execute0(PlatformClosureJob.java:75)
> at
> org.apache.ignite.internal.processors.platform.compute.PlatformAbstractJob.execute(PlatformAbstractJob.java:79)
> at
> org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:570)
> at
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7121)
> at
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:564)
> at
> org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> at
> org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1344)
> at
> org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:2102)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1328)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4600(GridIoManager.java:157)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1213)
> at
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.AssertionError: tail=Tail[pageId=0002ffff00000284, cnt=
> 1, lvl=2, sibling=null], res=NOT_FOUND
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2083)
> ... 47 more {code}
> The reason behind it is a broken tree invariant that last leaf item must also
> be present in one of inner nodes (unless it’s a last element in the entire
> tree). In the assertion above it’s implied that all inner nodes chain from
> the leaf to the root have been checked and an item has not been found.
> Bug appears when Ignite node experiences slow inserts and very rapid
> deletions (caused by TTL, for example). Let’s examine the following tree
> configuration (assuming that 2 items in the node is the max):
> {code:java}
> [ R:0 ]
> / \
> [ X:0 | K:1 ] ...
> / | \
> ... [ K:1 ] [ R:0 ] {code}
> Key {{K}} appears twice in the tree and update {{K=2}} would be performed in
> two steps: inner node first, leaf node last. Code that does that can be found
> in {{BPlusTree.Put#found}} method. These is a time gap between these two
> modifications where none of tree’s pages are locked, and other threads are
> free to attempt any other tree updates. Corruption happens when other thread
> tries to remove {{R}} during the gap. Here are tree states during these
> operations:
> {code:java}
> // First half of replace operation.
> [ R:0 ]
> / \
> [ X:0 | K:2 ] ... // Replace happed in inner
> node at first.
> / | \
> ... [ K:1 ] [ R:0 ]
> // Remove R starts deleting an item from the leaf.
> [ R:0 ]
> / \
> [ X:0 | K:2 ] ...
> / | \
> ... [ K:1 ] [ ] // Leaf is empty, merge it
> about to happen.
> // Empty branch is merged with its left neighbour.
> [ R:0 ]
> / \
> [ X:0 ] ... // K:2 item has been removed
> during the merge.
> / \
> ... [ K:1 ]
> // R in the root is finally replaced with appropriate item and removal is
> complete.
> [ K:1 ] // K:1 is copied from the
> leaf as the closest left item.
> / \
> [ X:0 ] ...
> / \
> ... [ K:1 ]
> // K:2 thread wakes up and continues updating downwards, changing the value
> in the leaf. Operation completed.
> [ K:1 ]
> / \
> [ X:0 ] ...
> / \
> ... [ K:2 ] {code}
> As a result of these events, root points to a nonexistent value {{{}1{}}},
> which is most likely an invalid pointer. In case of H2 indexes, everything
> seems to be working, because actual columns are inlined and the pointer is
> never read. {{AssertionError}} from the beginning happens when one tries to
> delete {{{}K{}}}. As a last item in the leaf, it must have a copy in inner
> node, but it’s not the case since {{K:1}} is not equal to {{{}K:2{}}}.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)