[
https://issues.apache.org/jira/browse/IGNITE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992226#comment-16992226
]
Alexei Scherbakov edited comment on IGNITE-12429 at 12/10/19 7:10 AM:
----------------------------------------------------------------------
[~ivan.glukos]
I have some objections.
1. I don't think this is right. Having an ability to specify history in
checkpoints is the same as setting a duration equal to checkpointFreq *
walHistSize.
This is a good thing to have for me. Probably we should change the property to
be measured in time units or just add a javadoc explaining how this is
translated to a duration.
2. For me the root cause is wrong threatment of histMap when calculating
available history for reservation.
We already have a caching mechanics for checkpoint entries [1].
Looks like it's possible to keep all the history in the heap (only store
references actually) using lazy loading/unloading when needed and get reid of
IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE (or maybe use it as a hint for
caching).
Also I do not understand how having sparse map will help us because we need all
entries for history calculation.
[1]
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.GroupStateLazyStore
was (Author: ascherbakov):
[~ivan.glukos]
I have some objections.
1. I don't think this is right. Having an ability to specify history in
checkpoints is the same as setting a duration equal to checkpointFreq *
walHistSize.
This is a good thing to have for me. Probably we should change the property to
be measured in time units or just add a javadoc explaining how this is
transalated to duration.
2. For me the root cause is wrong threatment of histMap when calculating
available history for reservation.
We already have a caching mechanics for checkpoint entries [1].
Looks like it's possible to keep all the history in the heap (only store
references actually) using lazy loading/unloading when needed and get reid of
IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE (or maybe use it as a hint for
caching).
Also I do not understand how having sparse map will help us because we need all
entries for history calculation.
[1]
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.GroupStateLazyStore
> Rework bytes-based WAL archive size management logic to make historical
> rebalance more predictable
> --------------------------------------------------------------------------------------------------
>
> Key: IGNITE-12429
> URL: https://issues.apache.org/jira/browse/IGNITE-12429
> Project: Ignite
> Issue Type: Improvement
> Affects Versions: 2.7, 2.7.5, 2.7.6
> Reporter: Ivan Rakov
> Priority: Major
>
> Since 2.7 DataStorageConfiguration allows to specify size of WAL archive in
> bytes (see DataStorageConfiguration#maxWalArchiveSize), which is much more
> trasparent to user.
> Unfortunately, new logic may be unpredictable when it comes to the historical
> rebalance. WAL archive is truncated when one of the following conditions
> occur:
> 1. Total number of checkpoints in WAL archive is bigger than
> DataStorageConfiguration#walHistSize
> 2. Total size of WAL archive is bigger than
> DataStorageConfiguration#maxWalArchiveSize
> Independently, in-memory checkpoint history contains only fixed number of
> last checkpoints (can be changed with
> IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE, 100 by default).
> All these particular qualities make it hard for user to cotrol usage of
> historical rebalance. Imagine the case when user has slight load (WAL gets
> rotated very slowly) and default checkpoint frequency. After 100 * 3 = 300
> minutes, all updates in WAL will be impossible to be received via historical
> rebalance even if:
> 1. User has configured large DataStorageConfiguration#maxWalArchiveSize
> 2. User has configured large DataStorageConfiguration#walHistSize
> At the same time, setting large IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE
> will help (only with previous two points combined), but Ignite node heap
> usage may increase dramatically.
> I propose to change WAL history management logic in the following way:
> 1. *Don't cut* WAL archive when number of checkpoint exceeds
> DataStorageConfiguration#walHistSize. WAL history should be managed only
> based on DataStorageConfiguration#maxWalArchiveSize.
> 2. Checkpoint history should contain fixed number of entries, but should
> cover the whole stored WAL archive (not only its more recent part with
> IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE last checkpoints). This can be
> achieved by making checkpoint history sparse: some intermediate checkpoints
> *may be not present in history*, but fixed number of checkpoints can be
> positioned either in uniform distribution (trying to keep fixed number of
> bytes between two neighbour checkpoints) or exponentially (trying to keep
> fixed ratio between [size of WAL from checkpoint(N-1) to current write
> pointer] and [size of WAL from checkpoint(N) to current write pointer]).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)