[
https://issues.apache.org/jira/browse/IGNITE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080883#comment-18080883
]
Sergey Chugunov commented on IGNITE-27746:
------------------------------------------
I have an idea of how to improve this feature.
This patch introduces a new mechanism performing connection checks in parallel
to an old one based on a sequential testing on nodes.
Parallel checks mechanism may say to the sequential one: there is an alive node
in a remote datacenter, keep testing nodes. But the sequential mechanism has a
limit of 3 nodes at max it is allowed to check before segmenting current node
from topology.
So if this alive node is farther away than 3 nodes, then current node will
segment itself anyway.
We may improve it by better integration between parallel and sequential
mechanisms.
> MDC. Implement parallel ping of DC2's nodes with the connection recovery.
> -------------------------------------------------------------------------
>
> Key: IGNITE-27746
> URL: https://issues.apache.org/jira/browse/IGNITE-27746
> Project: Ignite
> Issue Type: Sub-task
> Reporter: Vladimir Steshin
> Priority: Major
> Labels: IEP-140, ise
> Fix For: 2.19
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Consider:
> * The Multy-DC feature is on.
> * A corner node from DC1 can't send a message to it's next node in DC2.
> * DC2 is unavailable.
> * No node of DC1 can connect to any node in DC2.
> To prevent sequential nodes failure in DC1 we need to extend the connection
> recovery mechanics. We need to know whether DC2 is completely unavailable. If
> so, we switch to DC/brain split but keep nodes of DC1 online. To achive this
> we might ping DC2's nodes from the edge node while it does the normal
> connection recovery under the same connection recovery timeout. If the
> recovery fails and no ping to DC2 is success, we consider DC1 to work
> separatelly from DC2.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)