Ivan Bessonov created IGNITE-15990:
--------------------------------------
Summary: AssertionError in B+Tree under load
Key: IGNITE-15990
URL: https://issues.apache.org/jira/browse/IGNITE-15990
Project: Ignite
Issue Type: Bug
Components: data structures
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
Fix For: 2.13
Following exception has occurred in GridGain CE 8.7.29:
{code:java}
class
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-830802122,
val2=844420635165316]], cacheId=-112787658, cacheName=TABLEDSS,
indexName=TABLEDSS_TTL_DESC_IDX, msg=Runtime failure on search row:
Row@28da1b19[ ... ]
at
org.apache.ignite.internal.processors.query.h2.database.H2Tree.corruptedTreeException(H2Tree.java:927)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2103)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removex(BPlusTree.java:1899)
at
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:515)
at
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.remove(GridH2Table.java:822)
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:420)
at
org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2952)
at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:437)
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:2703)
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1727)
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1694)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2643)
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:435)
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2329)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2596)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2056)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1873)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1674)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:319)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:496)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:454)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:267)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.remove0(GridDhtAtomicCache.java:1192)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.remove0(GridDhtAtomicCache.java:728)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.remove(GridCacheAdapter.java:2858)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.remove(GridCacheAdapter.java:2835)
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.remove(IgniteCacheProxyImpl.java:1431)
at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.remove(GatewayProtectedCacheProxy.java:913)
at
org.apache.ignite.internal.processors.platform.cache.PlatformCache.processInStreamOutLong(PlatformCache.java:521)
at
org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutLong(PlatformTargetProxyImpl.java:66)
at
org.apache.ignite.internal.processors.platform.callback.PlatformCallbackUtils.inLongOutLong(Native
Method)
at
org.apache.ignite.internal.processors.platform.callback.PlatformCallbackGateway.computeJobExecute(PlatformCallbackGateway.java:328)
at
org.apache.ignite.internal.processors.platform.compute.PlatformClosureJob.execute0(PlatformClosureJob.java:75)
at
org.apache.ignite.internal.processors.platform.compute.PlatformAbstractJob.execute(PlatformAbstractJob.java:79)
at
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:570)
at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7121)
at
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:564)
at
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1344)
at
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:2102)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1328)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4600(GridIoManager.java:157)
at
org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1213)
at
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.AssertionError: tail=Tail[pageId=0002ffff00000284, cnt= 1,
lvl=2, sibling=null], res=NOT_FOUND
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2083)
... 47 more {code}
The reason behind it is a broken tree invariant that last leaf item must also
be present in one of inner nodes (unless it’s a last element in the entire
tree). In the assertion above it’s implied that all inner nodes chain from the
leaf to the root have been checked and an item has not been found.
Bug appears when Ignite node experiences slow inserts and very rapid deletions
(caused by TTL, for example). Let’s examine the following tree configuration
(assuming that 2 items in the node is the max):
{code:java}
[ R:0 ]
/ \
[ X:0 | K:1 ] ...
/ | \
... [ K:1 ] [ R:0 ] {code}
Key {{K}} appears twice in the tree and update {{K=2}} would be performed in
two steps: inner node first, leaf node last. Code that does that can be found
in {{BPlusTree.Put#found}} method. These is a time gap between these two
modifications where none of tree’s pages are locked, and other threads are free
to attempt any other tree updates. Corruption happens when other thread tries
to remove {{R}} during the gap. Here are tree states during these operations:
{code:java}
// First half of replace operation.
[ R:0 ]
/ \
[ X:0 | K:2 ] ... // Replace happed in inner
node at first.
/ | \
... [ K:1 ] [ R:0 ]
// Remove R starts deleting an item from the leaf.
[ R:0 ]
/ \
[ X:0 | K:2 ] ...
/ | \
... [ K:1 ] [ ] // Leaf is empty, merge it
about to happen.
// Empty branch is merged with its left neighbour.
[ R:0 ]
/ \
[ X:0 ] ... // K:2 item has been removed
during the merge.
/ \
... [ K:1 ]
// R in the root is finally replaced with appropriate item and removal is
complete.
[ K:1 ] // K:1 is copied from the leaf
as the closest left item.
/ \
[ X:0 ] ...
/ \
... [ K:1 ]
// K:2 thread wakes up and continues updating downwards, changing the value in
the leaf. Operation completed.
[ K:1 ]
/ \
[ X:0 ] ...
/ \
... [ K:2 ] {code}
As a result of these events, root points to a nonexistent value {{{}1{}}},
which is most likely an invalid pointer. In case of H2 indexes, everything
seems to be working, because actual columns are inlined and the pointer is
never read. {{AssertionError}} from the beginning happens when one tries to
delete {{{}K{}}}. As a last item in the leaf, it must have a copy in inner
node, but it’s not the case since {{K:1}} is not equal to {{{}K:2{}}}.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)