Anton Vinogradov created IGNITE-17793:
-----------------------------------------

             Summary: Historical rebalance must use HWM instead of LWM to seek 
the proper checkpoint
                 Key: IGNITE-17793
                 URL: https://issues.apache.org/jira/browse/IGNITE-17793
             Project: Ignite
          Issue Type: Sub-task
            Reporter: Anton Vinogradov


Currently, historical rebalance at 
{{CheckpointHistory#searchEarliestWalPointer}} seeks for the newest checkpoint 
with counter less that lowest entry has to be rebalanced.

Unfortunately, 

1) We may have more that one checkpoint with the same counter and it's 
impossible to use the newest one as a rebalance start point.

For example, we have partition with LWM=100, some gaps and HWM=200.
Checkpoint will have the counter == 100.
Then we may close some gaps, exluding 101 (to keep LWM == 100).
And again, checkpoint will have counter == 100.
Newest checkpoint marked with counter 100 will not cointain all committed 
entries with counter > 100.
And after the rebalance finish, we'll wee a warning "Some partition entries 
were missed during historical rebalance" and inconsistent cluster state.

2) After the cluster restart, we may face a situation that we have checkpoints 
before some counter but none of them can be used bor rebalancing.

For example, we, again, have partition with LWM=100, some gaps and HWM=200.
Restarting the cluster and first checkpoint marked at counter == 100.
But this single checkpoint does not contain some committed entries with counter 
> 100.

Possible solution is to use HWM instead of LWM during the search.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to