[jira] [Commented] (IGNITE-13193) Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL

Vladislav Pyatkov (Jira) Thu, 02 Jul 2020 05:26:06 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150228#comment-17150228
 ]


Vladislav Pyatkov commented on IGNITE-13193:
--------------------------------------------

[~slava.koptilin] I left three comments in PR.

Please look at those.

> Implement fallback to full partition rebalancing in case historical supplier 
> failed to read all necessary data updates from WAL
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-13193
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13193
>             Project: Ignite
>          Issue Type: Improvement
>    Affects Versions: 2.8.1
>            Reporter: Vyacheslav Koptilin
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Historical rebalance may fail for several reasons:
> 1) WAL on supplier node is corrupted - the supplier will trigger a failure 
> handler in the current implementation.
> 2) After iteration over WAL demander node didn't receive all updates to make 
> MOVING partition up-to-date (resulting update counter didn't converge with 
> expected update counter of OWNING partition) - demander will silently ignore 
> lack of updates in the current implementation.
> Such behavior negatively affects the stability of the cluster: an 
> inappropriate state of historical WAL is not a reason to fail a supplier node.
> The more proper way to handle this scenario is:
>  - Either try to rebalance partition historically from another supplier
>  - Or use full partition rebalance for problem partition
> Once the supplier fails to provide data from part of the WAL, its 
> corresponding sequence of checkpoints should be marked as inapplicable for 
> historical rebalance in order to prevent further errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IGNITE-13193) Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL

Reply via email to