[
https://issues.apache.org/jira/browse/IGNITE-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697408#comment-17697408
]
Sergey Chugunov commented on IGNITE-18715:
------------------------------------------
[~nifeng2xing] , do you have a reproducer for this problem? From the stack
trace provided in the ticket it is clear that corruption has happened during
index manipulations:
org.apache.ignite.internal.cache.{*}query.index.sorted.inline.InlineIndexTree{*}.
However it is not enough to narrow down the exact scenario, a reproducer is
needed or at least persistence artifacts - partition files, index.bin file, WAL
segments - that were on disk after node crash.
At the same time it should be possible to fix the issue by deleting index.bin
file for the affected cache - in that case node should be able to start again.
It will need to rebuild all secondary indexes that are stored in index.bin
file, but restoring normal operations for that node should be possible in
theory.
> B+Tree corruption error caused Ignite cluster crash and not able restart
> ------------------------------------------------------------------------
>
> Key: IGNITE-18715
> URL: https://issues.apache.org/jira/browse/IGNITE-18715
> Project: Ignite
> Issue Type: Bug
> Components: cache
> Affects Versions: 2.14
> Reporter: Isaac Zhu
> Priority: Blocker
>
> With version 2.14, see this error during doing cache remove & put. And after
> this happens, the cluster can't be restarted, all data get lost:
> [00:48:21,922][SEVERE][sys-stripe-8-#9][] Critical system error detected.
> Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
> corrupted [groupId=828437433, pageIds=[217017202749409008],
> cacheId=-595580467, cacheName=SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS,
> indexName=URT_TESTCASE_RESULTS_VCS_STATUS_IDX, groupName=nav_mem_part,
> msg=Runtime failure on search row: Row@7b8be26c[ key:
> SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS_070eda86_0aab_4da3_900c_8d3baf08b3a7_KEY
> [idHash=1221100184, hash=701595465, TEST_CASE_ID=610062, GROUP_ID=497], val:
> SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS_070eda86_0aab_4da3_900c_8d3baf08b3a7
> [idHash=1823360128, hash=-882680090, NEW=null, STATUS=MARK_FOR_DELETION,
> DIFF_FILE_ID=610, EXEC_TIME=67, TIMED_OUT=null, RECORD_TIME=2023-02-05
> 20:26:30.234, FILE_MOD_TIME=2023-02-02 01:35:59.0, SINCE=2022-10-22
> 00:00:00.0, JIRA=, ERROR_TOOL=null, ERROR_CODE=null, BUILD_DATE=2023-02-01
> 20:00:00.0, PRODUCER_START_TIME=2023-02-03 00:10:41.0] ][ MARK_FOR_DELETION,
> 497, 610062 ]]]] class
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> B+Tree is corrupted [groupId=828437433, pageIds=[217017202749409008],
> cacheId=-595580467, cacheName=SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS,
> indexName=URT_TESTCASE_RESULTS_VCS_STATUS_IDX, groupName=nav_mem_part,
> msg=Runtime failure on search row: Row@7b8be26c[ key:
> SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS_070eda86_0aab_4da3_900c_8d3baf08b3a7_KEY
> [idHash=1221100184, hash=701595465, TEST_CASE_ID=610062, GROUP_ID=497], val:
> SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS_070eda86_0aab_4da3_900c_8d3baf08b3a7
> [idHash=1823360128, hash=-882680090, NEW=null, STATUS=MARK_FOR_DELETION,
> DIFF_FILE_ID=610, EXEC_TIME=67, TIMED_OUT=null, RECORD_TIME=2023-02-05
> 20:26:30.234, FILE_MOD_TIME=2023-02-02 01:35:59.0, SINCE=2022-10-22
> 00:00:00.0, JIRA=, ERROR_TOOL=null, ERROR_CODE=null, BUILD_DATE=2023-02-01
> 20:00:00.0, PRODUCER_START_TIME=2023-02-03 00:10:41.0] ][ MARK_FOR_DELETION,
> 497, 610062 ]] at
> org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.corruptedTreeException(InlineIndexTree.java:561)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2310)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removex(BPlusTree.java:2079)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexImpl.remove(InlineIndexImpl.java:377)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexImpl.onUpdate(InlineIndexImpl.java:330)
> at
> org.apache.ignite.internal.cache.query.index.IndexProcessor.updateIndex(IndexProcessor.java:465)
> at
> org.apache.ignite.internal.cache.query.index.IndexProcessor.updateIndexes(IndexProcessor.java:308)
> at
> org.apache.ignite.internal.cache.query.index.IndexProcessor.store(IndexProcessor.java:156)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:2741)
> at
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:420)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2629)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2611)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.update(IgniteCacheOffheapManagerImpl.java:2510)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.update(GridCacheOffheapManager.java:2600)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.update(IgniteCacheOffheapManagerImpl.java:440)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:2987)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$applyLogicalUpdates$29(GridCacheDatabaseSharedManager.java:2775)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApply$28(GridCacheDatabaseSharedManager.java:2455)
> at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:637)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) at
> java.lang.Thread.run(Thread.java:748) Caused by:
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
> java.lang.IllegalStateException: Item not found: 19 at
> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:345)
> at
> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:165)
> at
> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:136)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.createIndexRow(InlineIndexTree.java:360)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.io.AbstractInlineLeafIO.getLookupRow(AbstractInlineLeafIO.java:129)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.io.AbstractInlineLeafIO.getLookupRow(AbstractInlineLeafIO.java:37)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.getRow(InlineIndexTree.java:403)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.getRow(InlineIndexTree.java:72)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.getRow(BPlusTree.java:5693)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.compare(InlineIndexTree.java:309)
> at
> org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.compare(InlineIndexTree.java:72)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5680)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:5600)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$1100(BPlusTree.java:162)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:369)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:6216)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run(BPlusTree.java:349)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:6202)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:174)
> at
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:415)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlusTree.java:6403)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2345)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2364)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2364)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2364)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2364)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2272)
> ... 19 more Caused by: java.lang.IllegalStateException: Item not found: 19
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:488)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:596)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:638)
> at
> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readIncomplete(CacheDataRowAdapter.java:380)
> at
> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:316)
> ... 45 more
--
This message was sent by Atlassian Jira
(v8.20.10#820010)