[
https://issues.apache.org/jira/browse/IGNITE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Rakov updated IGNITE-12429:
--------------------------------
Affects Version/s: 2.7
2.7.5
2.7.6
> Rework bytes-based WAL archive size management logic to make historical
> rebalance more predictable
> --------------------------------------------------------------------------------------------------
>
> Key: IGNITE-12429
> URL: https://issues.apache.org/jira/browse/IGNITE-12429
> Project: Ignite
> Issue Type: Improvement
> Affects Versions: 2.7, 2.7.5, 2.7.6
> Reporter: Ivan Rakov
> Priority: Major
>
> Since 2.7 DataStorageConfiguration allows to specify size of WAL archive in
> bytes (see DataStorageConfiguration#maxWalArchiveSize), which is much more
> trasparent to user.
> Unfortunately, new logic may be unpredictable when it comes to the historical
> rebalance. WAL archive is truncated when one of the following conditions
> occur:
> 1. Total number of checkpoints in WAL archive is bigger than
> DataStorageConfiguration#walHistSize
> 2. Total size of WAL archive is bigger than
> DataStorageConfiguration#maxWalArchiveSize
> Independently, in-memory checkpoint history contains only fixed number of
> last checkpoints (can be changed with
> IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE, 100 by default).
> All these particular qualities make it hard for user to cotrol usage of
> historical rebalance. Imagine the case when user has slight load (WAL gets
> rotated very slowly) and default checkpoint frequency. After 100 * 3 = 300
> minutes, all updates in WAL will be impossible to be received via historical
> rebalance even if:
> 1. User has configured large DataStorageConfiguration#maxWalArchiveSize
> 2. User has configured large DataStorageConfiguration#walHistSize
> At the same time, setting large IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE
> will help (only with previous two points combined), but Ignite node heap
> usage may increase dramatically.
> I propose to change WAL history management logic in the following way:
> 1. *Don't* cut WAL archive when number of checkpoint exceeds
> DataStorageConfiguration#walHistSize. WAL history should be managed only
> based on DataStorageConfiguration#maxWalArchiveSize.
> 2. Checkpoint history should contain fixed number of entries, but should
> cover the whole stored WAL archive (not only its more recent part with
> IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE last checkpoints). This can be
> achieved by making checkpoint history sparse: some intermediate checkpoints
> *may be not present in history*, but fixed number of checkpoints can be
> positioned either in uniform distribution (trying to keep fixed number of
> bytes between two neighbour checkpoints) or exponentially (trying to keep
> fixed ratio between (size of WAL from checkpoint(N-1) to current write
> pointer) and (size of WAL from checkpoint(N) to current write pointer).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)