[
https://issues.apache.org/jira/browse/IGNITE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410511#comment-17410511
]
Ignite TC Bot commented on IGNITE-15227:
----------------------------------------
{panel:title=Branch: [pull/9292/head] Base: [master] : No blockers
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/9292/head] Base: [master] : New Tests
(2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}PDS 4{color} [[tests
2|https://ci.ignite.apache.org/viewLog.html?buildId=6163406]]
* {color:#013220}IgnitePdsTestSuite4:
PagesPossibleCorruptionDiagnosticTest.testCorruptedNodeFailsOnStart -
PASSED{color}
* {color:#013220}IgnitePdsTestSuite4:
PagesPossibleCorruptionDiagnosticTest.testDiagnosticCollectedOnCorruptedPageList
- PASSED{color}
{panel}
[TeamCity *--> Run :: All*
Results|https://ci.ignite.apache.org/viewLog.html?buildId=6163433&buildTypeId=IgniteTests24Java8_RunAll]
> Improve diagnostic capabilities of persistence corruptions
> ----------------------------------------------------------
>
> Key: IGNITE-15227
> URL: https://issues.apache.org/jira/browse/IGNITE-15227
> Project: Ignite
> Issue Type: Improvement
> Reporter: Denis Chudov
> Assignee: Denis Chudov
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> There are some diagnostic problems:
> * assertions inside of PagesList can lead to CorruptedTreeException, which
> makes no sense. Example:
> {code:java}
> 2020-11-30
> 20:17:27.170[ERROR]sys-stripe-29-#30%DPL_GRID%DplGridNodeName%[org.apache.ignite.Ignite]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924,
> val2=72372732968376779]],
> groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey,
> msg=Runtime failure on search row: SearchRow [key=KeyCacheObject
> [hasValBytes=true], hash=513719283, cacheId=-295471981]]]]
> 2org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924,
> val2=72372732968376779]],
> groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey,
> msg=Runtime failure on search row: SearchRow [key=KeyCacheObject
> [hasValBytes=true], hash=513719283, cacheId=-295471981]]
> 3at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6117)
> 4at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1937)
> 5at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1670)
> 6at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1653)
> 7at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2519)
> 8at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
> 9at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4312)
> 10at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4289)
> 11at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1555)
> 12at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:756)
> 13at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:794)
> 14at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:605)
> 15at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:477)
> 16at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:534)
> 17at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1092)
> 18at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:968)
> 19at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:923)
> 20at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:132)
> 21at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:229)
> 22at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:227)
> 23at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
> 24at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
> 25at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
> 26at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
> 27at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
> 28at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
> 29at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1722)
> 30at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1329)
> 31at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4600(GridIoManager.java:158)
> 32at
> org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1214)
> 33at
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
> 34at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
> 35at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> 36at java.lang.Thread.run(Thread.java:748)
> 37Caused by: java.lang.AssertionError: Incorrectly recycled pageId in reuse
> bucket: ff011e9e000012f7
> 38at
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.takeEmptyPage(PagesList.java:1358)
> 39at
> org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:517)
> 40at
> org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:74)
> 41at
> org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:35)
> 42at
> org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:112)
> 43at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1720)
> 44at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:2494)
> 45at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5876)
> 46at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5813)
> 47at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4000)
> 48at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3894)
> 49at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2020)
> 50at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> 51at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> 52at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
> {code}
> * corruptions of partition meta also lead to mismatching exception type in
> pages list, e.g.:
> {code:java}
> 2021-01-29
> 05:48:41.644[ERROR][db-checkpoint-thread-#307%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite]
> Critical system error detected. Will be handled accordingly to configured
> handler [
> 2hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failu
> 3reCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
> err=java.lang.AssertionError: Missing tails [bucket=250, tails=null,
> metaPage=000120ca00002798]]]
> 4java.lang.AssertionError: Missing tails [bucket=250, tails=null,
> metaPage=000120ca00002798]
> 5 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.updateTail(PagesList.java:624)
> 6 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.mergeNoNext(PagesList.java:1628)
> 7 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.removeDataPage(PagesList.java:1577)
> 8 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:318)
> 9 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:273)
> 10 at
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:292)
> 11 at
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:273)
> 12 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.removeDataRowByLink(AbstractFreeList.java:633)
> 13 at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:367)
> 14 at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.lambda$syncMetadata$2(GridCacheOffheapManager.java:288)
> 15 at
> org.apache.ignite.internal.util.IgniteUtils.lambda$wrapIgniteFuture$3(IgniteUtils.java:11665)
> 16 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 17 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 18 at java.lang.Thread.run(Thread.java:748)
> {code}
> reproducer:
> [https://github.com/gridgain/apache-ignite/blob/2603e9a01bc1f6033b760ef02ebaba9a8069b84b/modules/core/src/test/java/org/apache/ignite/Reproducer12005.java]
> All such exceptions should be passed to DiagnosticProcessor and contain page
> ids that are possibly corrupted, to be able to abalyze them in PDS.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)