[
https://issues.apache.org/jira/browse/IGNITE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Chudov updated IGNITE-15227:
----------------------------------
Description:
There are some diagnostic problems:
* assertions inside of PagesList can lead to CorruptedTreeException, which
makes no sense. Example:
{code:java}
2020-11-30
20:17:27.170[ERROR]sys-stripe-29-#30%DPL_GRID%DplGridNodeName%[org.apache.ignite.Ignite]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924,
val2=72372732968376779]],
groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey,
msg=Runtime failure on search row: SearchRow [key=KeyCacheObject
[hasValBytes=true], hash=513719283, cacheId=-295471981]]]]
2org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924,
val2=72372732968376779]],
groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey,
msg=Runtime failure on search row: SearchRow [key=KeyCacheObject
[hasValBytes=true], hash=513719283, cacheId=-295471981]]
3at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6117)
4at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1937)
5at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1670)
6at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1653)
7at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2519)
8at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
9at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4312)
10at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4289)
11at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1555)
12at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:756)
13at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:794)
14at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:605)
15at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:477)
16at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:534)
17at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1092)
18at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:968)
19at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:923)
20at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:132)
21at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:229)
22at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:227)
23at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
24at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
25at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
26at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
27at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
28at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
29at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1722)
30at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1329)
31at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4600(GridIoManager.java:158)
32at
org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1214)
33at
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
34at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
35at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
36at java.lang.Thread.run(Thread.java:748)
37Caused by: java.lang.AssertionError: Incorrectly recycled pageId in reuse
bucket: ff011e9e000012f7
38at
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.takeEmptyPage(PagesList.java:1358)
39at
org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:517)
40at
org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:74)
41at
org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:35)
42at
org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:112)
43at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1720)
44at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:2494)
45at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5876)
46at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5813)
47at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4000)
48at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3894)
49at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2020)
50at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
51at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
52at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
{code}
* corruptions of partition meta also lead to mismatching exception type in
pages list, e.g.:
{code:java}
2021-01-29
05:48:41.644[ERROR][db-checkpoint-thread-#307%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite]
Critical system error detected. Will be handled accordingly to configured
handler [
2hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failu
3reCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
err=java.lang.AssertionError: Missing tails [bucket=250, tails=null,
metaPage=000120ca00002798]]]
4java.lang.AssertionError: Missing tails [bucket=250, tails=null,
metaPage=000120ca00002798]
5 at
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.updateTail(PagesList.java:624)
6 at
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.mergeNoNext(PagesList.java:1628)
7 at
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.removeDataPage(PagesList.java:1577)
8 at
org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:318)
9 at
org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:273)
10 at
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:292)
11 at
org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:273)
12 at
org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.removeDataRowByLink(AbstractFreeList.java:633)
13 at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:367)
14 at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.lambda$syncMetadata$2(GridCacheOffheapManager.java:288)
15 at
org.apache.ignite.internal.util.IgniteUtils.lambda$wrapIgniteFuture$3(IgniteUtils.java:11665)
16 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
17 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
18 at java.lang.Thread.run(Thread.java:748)
{code}
reproducer:
[https://github.com/gridgain/apache-ignite/blob/2603e9a01bc1f6033b760ef02ebaba9a8069b84b/modules/core/src/test/java/org/apache/ignite/Reproducer12005.java]
All such exceptions should be passed to DiagnosticProcessor and contain page
ids that are possibly corrupted, to be able to abalyze them in PDS.
> Improve diagnostic capabilities of persistence corruptions
> ----------------------------------------------------------
>
> Key: IGNITE-15227
> URL: https://issues.apache.org/jira/browse/IGNITE-15227
> Project: Ignite
> Issue Type: Improvement
> Reporter: Denis Chudov
> Priority: Major
>
> There are some diagnostic problems:
> * assertions inside of PagesList can lead to CorruptedTreeException, which
> makes no sense. Example:
> {code:java}
> 2020-11-30
> 20:17:27.170[ERROR]sys-stripe-29-#30%DPL_GRID%DplGridNodeName%[org.apache.ignite.Ignite]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924,
> val2=72372732968376779]],
> groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey,
> msg=Runtime failure on search row: SearchRow [key=KeyCacheObject
> [hasValBytes=true], hash=513719283, cacheId=-295471981]]]]
> 2org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924,
> val2=72372732968376779]],
> groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey,
> msg=Runtime failure on search row: SearchRow [key=KeyCacheObject
> [hasValBytes=true], hash=513719283, cacheId=-295471981]]
> 3at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6117)
> 4at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1937)
> 5at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1670)
> 6at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1653)
> 7at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2519)
> 8at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
> 9at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4312)
> 10at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4289)
> 11at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1555)
> 12at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:756)
> 13at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:794)
> 14at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:605)
> 15at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:477)
> 16at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:534)
> 17at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1092)
> 18at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:968)
> 19at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:923)
> 20at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:132)
> 21at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:229)
> 22at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:227)
> 23at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
> 24at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
> 25at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
> 26at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
> 27at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
> 28at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
> 29at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1722)
> 30at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1329)
> 31at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4600(GridIoManager.java:158)
> 32at
> org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1214)
> 33at
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
> 34at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
> 35at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> 36at java.lang.Thread.run(Thread.java:748)
> 37Caused by: java.lang.AssertionError: Incorrectly recycled pageId in reuse
> bucket: ff011e9e000012f7
> 38at
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.takeEmptyPage(PagesList.java:1358)
> 39at
> org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:517)
> 40at
> org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:74)
> 41at
> org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:35)
> 42at
> org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:112)
> 43at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1720)
> 44at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:2494)
> 45at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5876)
> 46at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5813)
> 47at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4000)
> 48at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3894)
> 49at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2020)
> 50at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> 51at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> 52at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
> {code}
> * corruptions of partition meta also lead to mismatching exception type in
> pages list, e.g.:
> {code:java}
> 2021-01-29
> 05:48:41.644[ERROR][db-checkpoint-thread-#307%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite]
> Critical system error detected. Will be handled accordingly to configured
> handler [
> 2hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failu
> 3reCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
> err=java.lang.AssertionError: Missing tails [bucket=250, tails=null,
> metaPage=000120ca00002798]]]
> 4java.lang.AssertionError: Missing tails [bucket=250, tails=null,
> metaPage=000120ca00002798]
> 5 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.updateTail(PagesList.java:624)
> 6 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.mergeNoNext(PagesList.java:1628)
> 7 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.removeDataPage(PagesList.java:1577)
> 8 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:318)
> 9 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:273)
> 10 at
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:292)
> 11 at
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:273)
> 12 at
> org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.removeDataRowByLink(AbstractFreeList.java:633)
> 13 at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:367)
> 14 at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.lambda$syncMetadata$2(GridCacheOffheapManager.java:288)
> 15 at
> org.apache.ignite.internal.util.IgniteUtils.lambda$wrapIgniteFuture$3(IgniteUtils.java:11665)
> 16 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 17 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 18 at java.lang.Thread.run(Thread.java:748)
> {code}
> reproducer:
> [https://github.com/gridgain/apache-ignite/blob/2603e9a01bc1f6033b760ef02ebaba9a8069b84b/modules/core/src/test/java/org/apache/ignite/Reproducer12005.java]
> All such exceptions should be passed to DiagnosticProcessor and contain page
> ids that are possibly corrupted, to be able to abalyze them in PDS.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)