[
https://issues.apache.org/jira/browse/IGNITE-17793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Vinogradov updated IGNITE-17793:
--------------------------------------
Labels: iep-31 ise (was: )
> Historical rebalance must use HWM instead of LWM to seek the proper checkpoint
> ------------------------------------------------------------------------------
>
> Key: IGNITE-17793
> URL: https://issues.apache.org/jira/browse/IGNITE-17793
> Project: Ignite
> Issue Type: Sub-task
> Reporter: Anton Vinogradov
> Priority: Major
> Labels: iep-31, ise
>
> Currently, historical rebalance at
> {{CheckpointHistory#searchEarliestWalPointer}} seeks for the newest
> checkpoint with counter less that lowest entry has to be rebalanced.
> Unfortunately,
> 1) We may have more that one checkpoint with the same counter and it's
> impossible to use the newest one as a rebalance start point.
> For example, we have partition with LWM=100, some gaps and HWM=200.
> Checkpoint will have the counter == 100.
> Then we may close some gaps, exluding 101 (to keep LWM == 100).
> And again, checkpoint will have counter == 100.
> Newest checkpoint marked with counter 100 will not cointain all committed
> entries with counter > 100.
> And after the rebalance finish, we'll wee a warning "Some partition entries
> were missed during historical rebalance" and inconsistent cluster state.
> 2) After the cluster restart, we may face a situation that we have
> checkpoints before some counter but none of them can be used bor rebalancing.
> For example, we, again, have partition with LWM=100, some gaps and HWM=200.
> Restarting the cluster and first checkpoint marked at counter == 100.
> But this single checkpoint does not contain some committed entries with
> counter > 100.
> Possible solution is to use HWM instead of LWM during the search.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)