Ivan Bessonov created IGNITE-15990:
--------------------------------------

             Summary: AssertionError in B+Tree under load
                 Key: IGNITE-15990
                 URL: https://issues.apache.org/jira/browse/IGNITE-15990
             Project: Ignite
          Issue Type: Bug
          Components: data structures
            Reporter: Ivan Bessonov
            Assignee: Ivan Bessonov
             Fix For: 2.13


Following exception has occurred in GridGain CE 8.7.29:
{code:java}
class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-830802122, 
val2=844420635165316]], cacheId=-112787658, cacheName=TABLEDSS, 
indexName=TABLEDSS_TTL_DESC_IDX, msg=Runtime failure on search row: 
Row@28da1b19[ ... ]
        at 
org.apache.ignite.internal.processors.query.h2.database.H2Tree.corruptedTreeException(H2Tree.java:927)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2103)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removex(BPlusTree.java:1899)
        at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:515)
        at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.remove(GridH2Table.java:822)
        at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:420)
        at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2952)
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:437)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:2703)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1727)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1694)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2643)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:435)
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2329)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2596)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2056)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1873)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1674)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:319)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:496)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:454)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:267)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.remove0(GridDhtAtomicCache.java:1192)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.remove0(GridDhtAtomicCache.java:728)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.remove(GridCacheAdapter.java:2858)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.remove(GridCacheAdapter.java:2835)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.remove(IgniteCacheProxyImpl.java:1431)
        at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.remove(GatewayProtectedCacheProxy.java:913)
        at 
org.apache.ignite.internal.processors.platform.cache.PlatformCache.processInStreamOutLong(PlatformCache.java:521)
        at 
org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutLong(PlatformTargetProxyImpl.java:66)
        at 
org.apache.ignite.internal.processors.platform.callback.PlatformCallbackUtils.inLongOutLong(Native
 Method)
        at 
org.apache.ignite.internal.processors.platform.callback.PlatformCallbackGateway.computeJobExecute(PlatformCallbackGateway.java:328)
        at 
org.apache.ignite.internal.processors.platform.compute.PlatformClosureJob.execute0(PlatformClosureJob.java:75)
        at 
org.apache.ignite.internal.processors.platform.compute.PlatformAbstractJob.execute(PlatformAbstractJob.java:79)
        at 
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:570)
        at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7121)
        at 
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:564)
        at 
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
        at 
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1344)
        at 
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:2102)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1328)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4600(GridIoManager.java:157)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1213)
        at 
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.AssertionError: tail=Tail[pageId=0002ffff00000284, cnt= 1, 
lvl=2, sibling=null], res=NOT_FOUND
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2083)
        ... 47 more {code}
The reason behind it is a broken tree invariant that last leaf item must also 
be present in one of inner nodes (unless it’s a last element in the entire 
tree). In the assertion above it’s implied that all inner nodes chain from the 
leaf to the root have been checked and an item has not been found.

Bug appears when Ignite node experiences slow inserts and very rapid deletions 
(caused by TTL, for example). Let’s examine the following tree configuration 
(assuming that 2 items in the node is the max):
{code:java}
                       [ R:0 ]
                     /         \
      [ X:0 | K:1 ]              ...
    /       |       \
...      [ K:1 ]      [ R:0 ] {code}
Key {{K}} appears twice in the tree and update {{K=2}} would be performed in 
two steps: inner node first, leaf node last. Code that does that can be found 
in {{BPlusTree.Put#found}} method. These is a time gap between these two 
modifications where none of tree’s pages are locked, and other threads are free 
to attempt any other tree updates. Corruption happens when other thread tries 
to remove {{R}} during the gap. Here are tree states during these operations:
{code:java}
// First half of replace operation.
                       [ R:0 ]
                     /         \
      [ X:0 | K:2 ]              ...              // Replace happed in inner 
node at first.
    /       |       \
...      [ K:1 ]      [ R:0 ]


// Remove R starts deleting an item from the leaf.
                       [ R:0 ]
                     /         \
      [ X:0 | K:2 ]              ...
    /       |       \
...      [ K:1 ]      [ ]                         // Leaf is empty, merge it 
about to happen.


// Empty branch is merged with its left neighbour.
                [ R:0 ]
              /         \
      [ X:0 ]              ...                   // K:2 item has been removed 
during the merge.
    /         \
...             [ K:1 ]


// R in the root is finally replaced with appropriate item and removal is 
complete.
                [ K:1 ]                         // K:1 is copied from the leaf 
as the closest left item.
              /         \
      [ X:0 ]              ...
    /         \
...             [ K:1 ]


// K:2 thread wakes up and continues updating downwards, changing the value in 
the leaf. Operation completed.
                [ K:1 ]
              /         \
      [ X:0 ]              ...
    /         \
...             [ K:2 ] {code}
As a result of these events, root points to a nonexistent value {{{}1{}}}, 
which is most likely an invalid pointer. In case of H2 indexes, everything 
seems to be working, because actual columns are inlined and the pointer is 
never read. {{AssertionError}} from the beginning happens when one tries to 
delete {{{}K{}}}. As a last item in the leaf, it must have a copy in inner 
node, but it’s not the case since {{K:1}} is not equal to {{{}K:2{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to