[jira] [Updated] (IGNITE-21135) CdcManager might collect WAL data before restoring it's state

Maksim Timonin (Jira) Fri, 22 Dec 2023 05:55:03 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Maksim Timonin updated IGNITE-21135:
------------------------------------
    Description: 
It should be guaranteed that `CdcManager` handle all WAL records in the same 
order as they has been written to WAL.

To guarantee that, on Ignite start `CdcManager` should read WAL records from 
disk first, and only after that it should start handle WALRecords written in 
runtime.

Now, this guarantee is broken due to the `MemoryRecoveryRecord` might be 
handled before finishing reading records from disk.

Scenario:

Ignite node starts, it restores binary memory in 
`GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
 # `FileWriteAheadLogManager` resumes logging and starts the 
`wal-segment-syncer` thread.
 # `MemoryRecoveryRecord` is written.
 # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
 # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`, 
then  `CdcManager#afterBinaryMemoryRestore` method is invoked.
 # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
 # `CdcManager#afterBinaryMemoryRestore` finishes handling historical records 
after that.

To fix this issue:
 # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the 
`MemoryRecoveryRecord`.
 # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the 
`#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager` 
(see how it's done for in-memory caches in 
`IgniteCacheDatabaseSharedManager#startMemoryRestore`).

 

  was:
It should be guaranteed that `CdcManager` handle all WAL records in the same 
order as they has been written to WAL.

`CdcManager` handle WAL records with 2 methods:
 # On Ignite start `CdcManager` handles restored WAL records with 
`CdcManager#afterBinaryMemoryRestore`;
 #  WALRecords written in runtime are handled with `CdcManager#collect` method, 
called in a background system thread.

Then for the guarantee it's required that `#afterBinaryMemoryRestore` must be 
finished before any new WALRecords written and `collect` is called.

Now, this guarantee is broken due to the `MemoryRecoveryRecord` is written 
before `CdcManager#afterBinaryMemoryRestore` finishes.

Scenario:

Ignite node starts, it restores binary memory in 
`GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
 # `FileWriteAheadLogManager` resumes logging and starts the 
`wal-segment-syncer` thread.
 # `MemoryRecoveryRecord` is written.
 # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
 # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`, 
then  `CdcManager#afterBinaryMemoryRestore` method is invoked.
 # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
 # `CdcManager#afterBinaryMemoryRestore` finishes handling historical records 
after that.

To fix this issue:
 # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the 
`MemoryRecoveryRecord`.
 # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the 
`#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager` 
(see how it's done for in-memory caches in 
`IgniteCacheDatabaseSharedManager#startMemoryRestore`).

 


> CdcManager might collect WAL data before restoring it's state
> -------------------------------------------------------------
>
>                 Key: IGNITE-21135
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21135
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Maksim Timonin
>            Assignee: Maksim Timonin
>            Priority: Major
>              Labels: ise
>             Fix For: 2.17
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> It should be guaranteed that `CdcManager` handle all WAL records in the same 
> order as they has been written to WAL.
> To guarantee that, on Ignite start `CdcManager` should read WAL records from 
> disk first, and only after that it should start handle WALRecords written in 
> runtime.
> Now, this guarantee is broken due to the `MemoryRecoveryRecord` might be 
> handled before finishing reading records from disk.
> Scenario:
> Ignite node starts, it restores binary memory in 
> `GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
>  # `FileWriteAheadLogManager` resumes logging and starts the 
> `wal-segment-syncer` thread.
>  # `MemoryRecoveryRecord` is written.
>  # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
>  # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`, 
> then  `CdcManager#afterBinaryMemoryRestore` method is invoked.
>  # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
>  # `CdcManager#afterBinaryMemoryRestore` finishes handling historical records 
> after that.
> To fix this issue:
>  # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the 
> `MemoryRecoveryRecord`.
>  # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and 
> the `#afterBinaryMemoryRestore` method must be invoked directly on the 
> `CdcManager` (see how it's done for in-memory caches in 
> `IgniteCacheDatabaseSharedManager#startMemoryRestore`).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-21135) CdcManager might collect WAL data before restoring it's state

Reply via email to