[
https://issues.apache.org/jira/browse/IGNITE-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maksim Timonin updated IGNITE-21135:
------------------------------------
Description:
It should be guaranteed that `CdcManager` handle all WAL records in the same
order as they has been written to WAL.
Now, this guarantee is broken due to contention bertween methods
`#afterBinaryMemoryRestore` and `#collect`.
Scenario of the contention, when `MemoryRecoveryRecord` is handled before prior
records:
Ignite node starts, it restores binary memory in
`GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
# `FileWriteAheadLogManager` resumes logging and starts the
`wal-segment-syncer` thread.
# `MemoryRecoveryRecord` is written.
# Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
# Implementations of `CdcManager` must extend `DatabaseLifecycleListener`,
then `CdcManager#afterBinaryMemoryRestore` method is invoked.
# Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
# `CdcManager#afterBinaryMemoryRestore` finishes handling historical records
after that.
To fix this issue:
# `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the
`MemoryRecoveryRecord`.
# Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the
`#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager`
(see how it's done for in-memory caches in
`IgniteCacheDatabaseSharedManager#startMemoryRestore`).
was:
`CdcManager` must guarantee that its method `#afterBinaryMemoryRestore` must
finish before the first call of `#collect`.
But actually there is a contention between these methods. Scenario of the
contention:
Ignite node starts, it restores binary memory in
`GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
# `FileWriteAheadLogManager` resumes logging and starts the
`wal-segment-syncer` thread.
# `MemoryRecoveryRecord` is written.
# Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
# Implementations of `CdcManager` must extend `DatabaseLifecycleListener`,
then `CdcManager#afterBinaryMemoryRestore` method is invoked.
# Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
# `CdcManager#afterBinaryMemoryRestore` finishes after that.
To fix this issue:
# `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the
`MemoryRecoveryRecord`.
# Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and the
`#afterBinaryMemoryRestore` method must be invoked directly on the `CdcManager`
(see how it's done for in-memory caches in
`IgniteCacheDatabaseSharedManager#startMemoryRestore`).
> CdcManager might collect WAL data before restoring it's state
> -------------------------------------------------------------
>
> Key: IGNITE-21135
> URL: https://issues.apache.org/jira/browse/IGNITE-21135
> Project: Ignite
> Issue Type: Bug
> Reporter: Maksim Timonin
> Assignee: Maksim Timonin
> Priority: Major
> Labels: ise
> Fix For: 2.17
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> It should be guaranteed that `CdcManager` handle all WAL records in the same
> order as they has been written to WAL.
> Now, this guarantee is broken due to contention bertween methods
> `#afterBinaryMemoryRestore` and `#collect`.
> Scenario of the contention, when `MemoryRecoveryRecord` is handled before
> prior records:
> Ignite node starts, it restores binary memory in
> `GridCacheDatabaseSharedManager#restoreBinaryMemory`. In this method:
> # `FileWriteAheadLogManager` resumes logging and starts the
> `wal-segment-syncer` thread.
> # `MemoryRecoveryRecord` is written.
> # Callbacks `DatabaseLifecycleListener#afterBinaryMemoryRestore` are called.
> # Implementations of `CdcManager` must extend `DatabaseLifecycleListener`,
> then `CdcManager#afterBinaryMemoryRestore` method is invoked.
> # Background `wal-segment-syncer` thread invokes `CdcManager#collect` method.
> # `CdcManager#afterBinaryMemoryRestore` finishes handling historical records
> after that.
> To fix this issue:
> # `CdcManager#afterBinaryMemoryRestore` must be invoked before writing the
> `MemoryRecoveryRecord`.
> # Then now `CdcManager` should not extend `DatabaseLifecycleListener`, and
> the `#afterBinaryMemoryRestore` method must be invoked directly on the
> `CdcManager` (see how it's done for in-memory caches in
> `IgniteCacheDatabaseSharedManager#startMemoryRestore`).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)