[ 
https://issues.apache.org/jira/browse/IGNITE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Rakov updated IGNITE-12429:
--------------------------------
    Affects Version/s: 2.7
                       2.7.5
                       2.7.6

> Rework bytes-based WAL archive size management logic to make historical 
> rebalance more predictable
> --------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-12429
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12429
>             Project: Ignite
>          Issue Type: Improvement
>    Affects Versions: 2.7, 2.7.5, 2.7.6
>            Reporter: Ivan Rakov
>            Priority: Major
>
> Since 2.7 DataStorageConfiguration allows to specify size of WAL archive in 
> bytes (see DataStorageConfiguration#maxWalArchiveSize), which is much more 
> trasparent to user. 
> Unfortunately, new logic may be unpredictable when it comes to the historical 
> rebalance. WAL archive is truncated when one of the following conditions 
> occur:
> 1. Total number of checkpoints in WAL archive is bigger than 
> DataStorageConfiguration#walHistSize
> 2. Total size of WAL archive is bigger than 
> DataStorageConfiguration#maxWalArchiveSize
> Independently, in-memory checkpoint history contains only fixed number of 
> last checkpoints (can be changed with 
> IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE, 100 by default).
> All these particular qualities make it hard for user to cotrol usage of 
> historical rebalance. Imagine the case when user has slight load (WAL gets 
> rotated very slowly) and default checkpoint frequency. After 100 * 3 = 300 
> minutes, all updates in WAL will be impossible to be received via historical 
> rebalance even if:
> 1. User has configured large DataStorageConfiguration#maxWalArchiveSize
> 2. User has configured large DataStorageConfiguration#walHistSize
> At the same time, setting large IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE 
> will help (only with previous two points combined), but Ignite node heap 
> usage may increase dramatically.
> I propose to change WAL history management logic in the following way:
> 1. *Don't* cut WAL archive when number of checkpoint exceeds 
> DataStorageConfiguration#walHistSize. WAL history should be managed only 
> based on DataStorageConfiguration#maxWalArchiveSize.
> 2. Checkpoint history should contain fixed number of entries, but should 
> cover the whole stored WAL archive (not only its more recent part with 
> IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE last checkpoints). This can be 
> achieved by making checkpoint history sparse: some intermediate checkpoints 
> *may be not present in history*, but fixed number of checkpoints can be 
> positioned either in uniform distribution (trying to keep fixed number of 
> bytes between two neighbour checkpoints) or exponentially (trying to keep 
> fixed ratio between (size of WAL from checkpoint(N-1) to current write 
> pointer) and (size of WAL from checkpoint(N) to current write pointer).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to