[jira] [Updated] (IGNITE-17793) Historical rebalance must use HWM instead of LWM to seek the proper checkpoint

Anton Vinogradov (Jira) Fri, 30 Sep 2022 10:47:12 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-17793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anton Vinogradov updated IGNITE-17793:
--------------------------------------
    Labels: iep-31 ise  (was: )

> Historical rebalance must use HWM instead of LWM to seek the proper checkpoint
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-17793
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17793
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Anton Vinogradov
>            Priority: Major
>              Labels: iep-31, ise
>
> Currently, historical rebalance at 
> {{CheckpointHistory#searchEarliestWalPointer}} seeks for the newest 
> checkpoint with counter less that lowest entry has to be rebalanced.
> Unfortunately, 
> 1) We may have more that one checkpoint with the same counter and it's 
> impossible to use the newest one as a rebalance start point.
> For example, we have partition with LWM=100, some gaps and HWM=200.
> Checkpoint will have the counter == 100.
> Then we may close some gaps, exluding 101 (to keep LWM == 100).
> And again, checkpoint will have counter == 100.
> Newest checkpoint marked with counter 100 will not cointain all committed 
> entries with counter > 100.
> And after the rebalance finish, we'll wee a warning "Some partition entries 
> were missed during historical rebalance" and inconsistent cluster state.
> 2) After the cluster restart, we may face a situation that we have 
> checkpoints before some counter but none of them can be used bor rebalancing.
> For example, we, again, have partition with LWM=100, some gaps and HWM=200.
> Restarting the cluster and first checkpoint marked at counter == 100.
> But this single checkpoint does not contain some committed entries with 
> counter > 100.
> Possible solution is to use HWM instead of LWM during the search.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-17793) Historical rebalance must use HWM instead of LWM to seek the proper checkpoint

Reply via email to