[
https://issues.apache.org/jira/browse/IGNITE-12617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Scherbakov updated IGNITE-12617:
---------------------------------------
Reviewer: Alexey Scherbakov
> PME-free switch should wait for recovery only at affected nodes.
> ----------------------------------------------------------------
>
> Key: IGNITE-12617
> URL: https://issues.apache.org/jira/browse/IGNITE-12617
> Project: Ignite
> Issue Type: Task
> Reporter: Anton Vinogradov
> Assignee: Anton Vinogradov
> Priority: Major
> Labels: iep-45
> Fix For: 2.9
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Since IGNITE-9913, new-topology operations allowed immediately after
> cluster-wide recovery finished.
> But is there any reason to wait for a cluster-wide recovery if only one node
> failed?
> In this case, we should recover only the failed node's backups.
> Unfortunately, {{RendezvousAffinityFunction}} tends to spread the node's
> backup partitions to the whole cluster. In this case, we, obviously, have to
> wait for cluster-wide recovery on switch.
> But what if only some nodes will be the backups for every primary?
> In case nodes combined into virtual cells where, for each partition, backups
> located at the same cell with primaries, it's possible to finish the switch
> outside the affected cell before tx recovery finish.
> This optimization will allow us to start and even finish new operations
> outside the failed cell without a cluster-wide switch finish (broken cell
> recovery) waiting.
> In other words, switch (when left/fail + baseline + rebalanced) will have
> little effect on the operation's (not related to failed cell) latency.
> In other words
> - We should wait for tx recovery before finishing the switch only on a broken
> cell.
> - We should wait for replicated caches tx recovery everywhere since every
> node is a backup of a failed one.
> - Upcoming operations related to the broken cell (including all replicated
> caches operations) will require a cluster-wide switch finish to be processed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)