[jira] [Updated] (IGNITE-17738) Cluster must be able to fix the partition inconsistency on restart/node_join by itself

Anton Vinogradov (Jira) Thu, 20 Oct 2022 10:12:08 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anton Vinogradov updated IGNITE-17738:
--------------------------------------
    Summary: Cluster must be able to fix the partition inconsistency on 
restart/node_join by itself  (was: Cluster must be able to fix the 
inconsistency on restart/node_join by itself)

> Cluster must be able to fix the partition inconsistency on restart/node_join 
> by itself
> --------------------------------------------------------------------------------------
>
>                 Key: IGNITE-17738
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17738
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Anton Vinogradov
>            Assignee: Maxim Muzafarov
>            Priority: Major
>              Labels: iep-31, ise
>             Fix For: 2.15
>
>         Attachments: PartialHistoricalRebalanceTest.java, 
> SkippedRebalanceBecauseOfTheSameLwmTest.java
>
>
> On cluster restart (because of power-off, OOM or some other problem) it's 
> possible to have PDS inconsistent (primary partitions may contain operations 
> missed on backups as well as counters may contain gaps even on primary).
> 1) Currently, "historical rebalance" is able to sync the data to the highest 
> LWM for every partition. 
> Most likely, a primary will be chosen as a rebalance source, but the data 
> after the LWM will not be rebalanced. So, all updates between LWM and HWM 
> will not be synchronized.
> See [^PartialHistoricalRebalanceTest.java]
> Such partition may be rebalanced correctly "later" in case of full rebalance 
> will be triggered sometime.
> 2) In case LWM is the same on primary and backup, rebalance will be skipped 
> for such partition.
> See [^SkippedRebalanceBecauseOfTheSameLwmTest.java]
> Proposals:
> 1) Cheap fix
> A possible solution for the case when the cluster failed and restarted (same 
> baseline) is to fix the counters automatically (when cluster composition is 
> equal to the baseline specified before the crash).
> Counters should be set as
>  - HWM at primary and as LWM at backups for caches with 2+ backups,
>  - LWM at primary and as HWM at backups for caches with a single backup.
> 2) Correct fix
> Rebalance must honor whole counter state (LWM, HWM, gaps).
> 2.0) Primary HWM must be set to the highest HWM across the copies to avoid 
> reapplying of already applied update counters on backups.
> 2.1) In case when WAL is available all entries between LWM and HWM 
> (including) must be rebalanced to other nodes where they are required.
> Even from backups to the primary.
> 2.2) Full rebalance must be restricted when it causes any updates loss.
> For example, it's
>  - ok to replace B with A when
> A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm=120],
> because A contains whole B.
>  - NOT ok to replace B with A when
> A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], 
> hwm={*}148{*}], 
> when update *142* will be lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-17738) Cluster must be able to fix the partition inconsistency on restart/node_join by itself

Reply via email to