[ 
https://issues.apache.org/jira/browse/IGNITE-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Vinogradov updated IGNITE-17738:
--------------------------------------
    Description: 
On cluster restart (because of power-off or some other problem) it's possible 
to have PDS inconsistent (primary partitions may contain operations missed on 
backups).

Currently, "historical rebalance" is able to sync the data to the highest LWM 
for every partition. 
Most likely, a primary will be chosen as a rebalance source, but the data after 
the LWM will not be rebalanced. So, all updates between LWM and HWM will not be 
synchronized.

A possible solution for the case when the cluster failed and restarted (same 
baseline) is to fix counters to help "historical rebalance" perform the sync.

Counters should be set as
 - HWM at primary and as LWM at backups for caches with 2+ backups,
 - LWM at primary and as HWM at backups for caches with a single backup.

This can be implemented as an extension for the "-consistency finalize` 
command, for example `-consistency finalize-on-restart`.

  was:
On cluster restart (because of power-off or some other problem) it's possible 
to have PDS inconsistent (primary partitions may contain operations missed on 
backups).

Currently, "historical rebalance" is able to sync the data to the highest LWM 
for every partition. 
Most likely, a primary will be chosen as a rebalance source, but the data after 
the LWM will not be rebalanced. So, all updates between LWM and HWM will not be 
synchronized.

A possible solution for the case when the cluster failed and restarted (same 
baseline) is to fix counters to help "historical rebalance".

Counters should be set as
 - HWM at primary and as LWM at backups for caches with 2+ backups,
 - LWM at primary and as HWM at backups for caches with a single backup.

This can be implemented as an extension for the "-consistency finalize` 
command, for example `-consistency finalize-on-restart`.


> Historical rebalance must be able to fix the consistency on cluster restart
> ---------------------------------------------------------------------------
>
>                 Key: IGNITE-17738
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17738
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Anton Vinogradov
>            Priority: Major
>              Labels: ise
>
> On cluster restart (because of power-off or some other problem) it's possible 
> to have PDS inconsistent (primary partitions may contain operations missed on 
> backups).
> Currently, "historical rebalance" is able to sync the data to the highest LWM 
> for every partition. 
> Most likely, a primary will be chosen as a rebalance source, but the data 
> after the LWM will not be rebalanced. So, all updates between LWM and HWM 
> will not be synchronized.
> A possible solution for the case when the cluster failed and restarted (same 
> baseline) is to fix counters to help "historical rebalance" perform the sync.
> Counters should be set as
>  - HWM at primary and as LWM at backups for caches with 2+ backups,
>  - LWM at primary and as HWM at backups for caches with a single backup.
> This can be implemented as an extension for the "-consistency finalize` 
> command, for example `-consistency finalize-on-restart`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to