[
https://issues.apache.org/jira/browse/IGNITE-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maksim Timonin updated IGNITE-21135:
------------------------------------
Description:
It should be guaranteed that `CdcManager` handle all WAL records in the same
order as they has been written to WAL.
To guarantee that, on Ignite start `CdcManager` should read WAL records from
disk first, and only after that it should start handle WALRecords written in
runtime.
Now, this guarantee is broken due to the `MemoryRecoveryRecord` might be
handled before finishing reading records from disk.
Scenario:
Ignite node starts, it restores binary memory in
`GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
# `FileWriteAheadLogManager` resumes logging and starts the
`wal-segment-syncer` thread.
# `MemoryRecoveryRecord` is written.
# Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
# Implementations of `CdcManager` must extend `DatabaseLifecycleListener`,
then `CdcManager#afterBinaryMemoryRestore` method is invoked.
# Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
# `CdcManager#afterBinaryMemoryRestore` finishes handling historical records
after that.
To fix this issue:
# `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the
`MemoryRecoveryRecord`.
# Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the
`#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager`
(see how it's done for in-memory caches in
`IgniteCacheDatabaseSharedManager#startMemoryRestore`).
was:
It should be guaranteed that `CdcManager` handle all WAL records in the same
order as they has been written to WAL.
`CdcManager` handle WAL records with 2 methods:
# On Ignite start `CdcManager` handles restored WAL records with
`CdcManager#afterBinaryMemoryRestore`;
# WALRecords written in runtime are handled with `CdcManager#collect` method,
called in a background system thread.
Then for the guarantee it's required that `#afterBinaryMemoryRestore` must be
finished before any new WALRecords written and `collect` is called.
Now, this guarantee is broken due to the `MemoryRecoveryRecord` is written
before `CdcManager#afterBinaryMemoryRestore` finishes.
Scenario:
Ignite node starts, it restores binary memory in
`GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
# `FileWriteAheadLogManager` resumes logging and starts the
`wal-segment-syncer` thread.
# `MemoryRecoveryRecord` is written.
# Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
# Implementations of `CdcManager` must extend `DatabaseLifecycleListener`,
then `CdcManager#afterBinaryMemoryRestore` method is invoked.
# Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
# `CdcManager#afterBinaryMemoryRestore` finishes handling historical records
after that.
To fix this issue:
# `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the
`MemoryRecoveryRecord`.
# Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the
`#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager`
(see how it's done for in-memory caches in
`IgniteCacheDatabaseSharedManager#startMemoryRestore`).
> CdcManager might collect WAL data before restoring it's state
> -------------------------------------------------------------
>
> Key: IGNITE-21135
> URL: https://issues.apache.org/jira/browse/IGNITE-21135
> Project: Ignite
> Issue Type: Bug
> Reporter: Maksim Timonin
> Assignee: Maksim Timonin
> Priority: Major
> Labels: ise
> Fix For: 2.17
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> It should be guaranteed that `CdcManager` handle all WAL records in the same
> order as they has been written to WAL.
> To guarantee that, on Ignite start `CdcManager` should read WAL records from
> disk first, and only after that it should start handle WALRecords written in
> runtime.
> Now, this guarantee is broken due to the `MemoryRecoveryRecord` might be
> handled before finishing reading records from disk.
> Scenario:
> Ignite node starts, it restores binary memory in
> `GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
> # `FileWriteAheadLogManager` resumes logging and starts the
> `wal-segment-syncer` thread.
> # `MemoryRecoveryRecord` is written.
> # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
> # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`,
> then `CdcManager#afterBinaryMemoryRestore` method is invoked.
> # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
> # `CdcManager#afterBinaryMemoryRestore` finishes handling historical records
> after that.
> To fix this issue:
> # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the
> `MemoryRecoveryRecord`.
> # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and
> the `#afterBinaryMemoryRestore` method must be invoked directly on the
> `CdcManager` (see how it's done for in-memory caches in
> `IgniteCacheDatabaseSharedManager#startMemoryRestore`).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)