[
https://issues.apache.org/jira/browse/IGNITE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrey Gura reassigned IGNITE-8529:
-----------------------------------
Assignee: Aleksey Plekhanov
> Implement testing framework for checking WAL delta records consistency
> ----------------------------------------------------------------------
>
> Key: IGNITE-8529
> URL: https://issues.apache.org/jira/browse/IGNITE-8529
> Project: Ignite
> Issue Type: New Feature
> Components: persistence
> Reporter: Ivan Rakov
> Assignee: Aleksey Plekhanov
> Priority: Major
> Fix For: 2.6
>
>
> We use sharp checkpointing of page memory in persistent mode. That implies
> that we write two types of record to write-ahead log: logical (e.g. data
> records) and phyisical (page snapshots + binary delta records). Physical
> records are applied only when node crashes/stops during ongoing checkpoint.
> We have the following invariant: checkpoint #(n-1) + all physical records =
> checkpoint #n.
> If correctness of physical records is broken, Ignite node may recover with
> incorrect page memory state, which in turn can bring unexpected delayed
> errors. However, consistency of physical records is poorly tested: only small
> part of our autotests perform node restarts, and even less part of them
> perform node stop when ongoing checkpoint is running.
> We should implement abstract test that:
> 1. Enforces checkpoint, freezes memory state at the moment of checkpoint.
> 2. Performs necessary test load.
> 3. Enforces checkpoint again, replays WAL and checks that page store at the
> moment of previous checkpoint with all applied physical records exactly
> equals to current checkpoint state.
> Except for checking correctness, test framework should do the following:
> 1. Gather statistics (like histogram) for types of wriiten physical records.
> That will help us to know what types of physical records are covered by test.
> 2. Visualize expected and actual page state (with all applied physical
> records) if incorrect page state is detected.
> Regarding implementation, I suppose we can use checkpoint listener mechanism
> to freeze page memory state at the moment of checkpoint.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)