Anton Vinogradov created IGNITE-17793:
-----------------------------------------
Summary: Historical rebalance must use HWM instead of LWM to seek
the proper checkpoint
Key: IGNITE-17793
URL: https://issues.apache.org/jira/browse/IGNITE-17793
Project: Ignite
Issue Type: Sub-task
Reporter: Anton Vinogradov
Currently, historical rebalance at
{{CheckpointHistory#searchEarliestWalPointer}} seeks for the newest checkpoint
with counter less that lowest entry has to be rebalanced.
Unfortunately,
1) We may have more that one checkpoint with the same counter and it's
impossible to use the newest one as a rebalance start point.
For example, we have partition with LWM=100, some gaps and HWM=200.
Checkpoint will have the counter == 100.
Then we may close some gaps, exluding 101 (to keep LWM == 100).
And again, checkpoint will have counter == 100.
Newest checkpoint marked with counter 100 will not cointain all committed
entries with counter > 100.
And after the rebalance finish, we'll wee a warning "Some partition entries
were missed during historical rebalance" and inconsistent cluster state.
2) After the cluster restart, we may face a situation that we have checkpoints
before some counter but none of them can be used bor rebalancing.
For example, we, again, have partition with LWM=100, some gaps and HWM=200.
Restarting the cluster and first checkpoint marked at counter == 100.
But this single checkpoint does not contain some committed entries with counter
> 100.
Possible solution is to use HWM instead of LWM during the search.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)