[
https://issues.apache.org/jira/browse/IGNITE-11749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy Govorukhin reassigned IGNITE-11749:
-------------------------------------------
Assignee: Anton Kalashnikov
> Implement automatic pages history dump on CorruptedTreeException
> ----------------------------------------------------------------
>
> Key: IGNITE-11749
> URL: https://issues.apache.org/jira/browse/IGNITE-11749
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexey Goncharuk
> Assignee: Anton Kalashnikov
> Priority: Major
> Fix For: 2.8
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently, the only way to debug possible bugs in checkpointer/recovery
> mechanics is to manually parse WAL files after the corruption happened. This
> is not practical for several reasons. First, it requires manual actions which
> depend on the content of the exception. Second, it is not always possible to
> obtain WAL files (it may contain sensitive data).
> We need to add a mechanics which will dump all information required for
> primary analysis of the corruption to the exception handler. For example, if
> an exception happened when materializing a link {{0xabcd}} written on an
> index page {{0xdcba}}, we need to dump history of both pages changes,
> checkpoint records on the analysis interval. Possibly, we should include
> FreeList pages to which the aforementioned pages were included to.
> Example of output:
> {noformat}
> [2019-05-07 11:57:57,350][INFO
> ][test-runner-#58%diagnostic.DiagnosticProcessorTest%][PageHistoryDiagnoster]
> Next WAL record :: PageSnapshot [fullPageId = FullPageId
> [pageId=0002ffff00000000, effectivePageId=0000ffff00000000,
> grpId=-2100569601], page = [
> Header [
> type=11 (PageMetaIO),
> ver=1,
> crc=0,
> pageId=844420635164672(offset=0, flags=10, partId=65535, index=0)
> ],
> PageMeta[
> treeRoot=844420635164675,
> lastSuccessfulFullSnapshotId=0,
> lastSuccessfulSnapshotId=0,
> nextSnapshotTag=1,
> lastSuccessfulSnapshotTag=0,
> lastAllocatedPageCount=0,
> candidatePageCount=0
> ]],
> super = [WALRecord [size=4129, chainSize=0, pos=FileWALPointer [idx=0,
> fileOff=103, len=4129], type=PAGE_RECORD]]]
> Next WAL record :: CheckpointRecord
> [cpId=c6ba7793-113b-4b54-8530-45e1708ca44c, end=false, cpMark=FileWALPointer
> [idx=0, fileOff=29, len=29], super=WALRecord [size=1963, chainSize=0,
> pos=FileWALPointer [idx=0, fileOff=39686, len=1963], type=CHECKPOINT_RECORD]]
> Next WAL record :: PageSnapshot [fullPageId = FullPageId
> [pageId=0002ffff00000000, effectivePageId=0000ffff00000000,
> grpId=-1368047378], page = [
> Header [
> type=11 (PageMetaIO),
> ver=1,
> crc=0,
> pageId=844420635164672(offset=0, flags=10, partId=65535, index=0)
> ],
> PageMeta[
> treeRoot=844420635164675,
> lastSuccessfulFullSnapshotId=0,
> lastSuccessfulSnapshotId=0,
> nextSnapshotTag=1,
> lastSuccessfulSnapshotTag=0,
> lastAllocatedPageCount=0,
> candidatePageCount=0
> ]],
> super = [WALRecord [size=4129, chainSize=0, pos=FileWALPointer [idx=0,
> fileOff=55961, len=4129], type=PAGE_RECORD]]]
> Next WAL record :: CheckpointRecord
> [cpId=145e599e-66fc-45f5-bde4-b0c392125968, end=false, cpMark=null,
> super=WALRecord [size=21409, chainSize=0, pos=FileWALPointer [idx=0,
> fileOff=13101788, len=21409], type=CHECKPOINT_RECORD]]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)