[jira] [Updated] (IGNITE-12617) PME-free switch should wait for recovery only at affected nodes.

Anton Vinogradov (Jira) Thu, 30 Apr 2020 09:26:13 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-12617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anton Vinogradov updated IGNITE-12617:
--------------------------------------
    Description: 
Since IGNITE-9913, new-topology operations allowed immediately after 
cluster-wide recovery finished.

But is there any reason to wait for a cluster-wide recovery if only one node 
failed?
In this case, we should recover only the failed node's backups.
Unfortunately, {{RendezvousAffinityFunction}} tends to spread the node's backup 
partitions to the whole cluster. In this case, we, obviously, have to wait for 
cluster-wide recovery on switch.

But what if only some nodes will be the backups for every primary?

In case nodes combined into virtual cells where, for each partition, backups 
located at the same cell with primaries, it's possible to finish the switch 
outside the affected cell before tx recovery finish.

This optimization will allow us to start and even finish new operations outside 
the failed cell without a cluster-wide switch finish (broken cell recovery) 
waiting.

In other words, switch (when left/fail + baseline + rebalanced) will have 
little effect on the operation's (not related to failed cell) latency.

In other words
- We should wait for tx recovery before finishing the switch only on a broken 
cell.
- We should wait for replicated caches tx recovery everywhere since every node 
is a backup of a failed one.
- Upcoming operations related to the broken cell (including all replicated 
caches operations) will require a cluster-wide switch finish to be processed.

  was:
Since IGNITE-9913, new-topology operations allowed immediately after 
cluster-wide recovery finished.

But is there any reason to wait for a cluster-wide recovery if only one node 
failed?
In this case, we should recover only the failed node's backups.
Unfortunately, {{RendezvousAffinityFunction}} tends to spread the node's backup 
partitions to the whole cluster. In this case, we, obviously have to perform 
cluster-wide recovery on switch.

But what if only some nodes will be the backups for every primary?

In case nodes combined into virtual cells where, for each partition, backups 
located at the same cell with primaries, it's possible to finish the switch 
outside the affected cell before tx recovery finish.

This optimization will allow us to start and even finish new operations outside 
the failed cell without cluster-wide switch finish waiting.

In other words, switch (when left/fail + baseline + rebalanced) will have 
little effect on the operation's (not related to failed cell) latency.

Assumptions
- We should wait for tx recovery before finishing the global switch.
- We should wait for replicated caches recovery globally before finishing 
switch locally.
- Upcoming replicated caches operations and operations related to the broken 
cell will require a cluster-wide switch finish to be committed.


> PME-free switch should wait for recovery only at affected nodes.
> ----------------------------------------------------------------
>
>                 Key: IGNITE-12617
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12617
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Anton Vinogradov
>            Assignee: Anton Vinogradov
>            Priority: Major
>              Labels: iep-45
>             Fix For: 2.9
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since IGNITE-9913, new-topology operations allowed immediately after 
> cluster-wide recovery finished.
> But is there any reason to wait for a cluster-wide recovery if only one node 
> failed?
> In this case, we should recover only the failed node's backups.
> Unfortunately, {{RendezvousAffinityFunction}} tends to spread the node's 
> backup partitions to the whole cluster. In this case, we, obviously, have to 
> wait for cluster-wide recovery on switch.
> But what if only some nodes will be the backups for every primary?
> In case nodes combined into virtual cells where, for each partition, backups 
> located at the same cell with primaries, it's possible to finish the switch 
> outside the affected cell before tx recovery finish.
> This optimization will allow us to start and even finish new operations 
> outside the failed cell without a cluster-wide switch finish (broken cell 
> recovery) waiting.
> In other words, switch (when left/fail + baseline + rebalanced) will have 
> little effect on the operation's (not related to failed cell) latency.
> In other words
> - We should wait for tx recovery before finishing the switch only on a broken 
> cell.
> - We should wait for replicated caches tx recovery everywhere since every 
> node is a backup of a failed one.
> - Upcoming operations related to the broken cell (including all replicated 
> caches operations) will require a cluster-wide switch finish to be processed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (IGNITE-12617) PME-free switch should wait for recovery only at affected nodes.

Reply via email to