[jira] [Commented] (IGNITE-27746) MDC. Implement parallel ping of DC2's nodes with the connection recovery.

Sergey Chugunov (Jira) Thu, 14 May 2026 05:01:49 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080883#comment-18080883
 ]


Sergey Chugunov commented on IGNITE-27746:
------------------------------------------

I have an idea of how to improve this feature.

This patch introduces a new mechanism performing connection checks in parallel 
to an old one based on a sequential testing on nodes.

Parallel checks mechanism may say to the sequential one: there is an alive node 
in a remote datacenter, keep testing nodes. But the sequential mechanism has a 
limit of 3 nodes at max it is allowed to check before segmenting current node 
from topology.

So if this alive node is farther away than 3 nodes, then current node will 
segment itself anyway.

We may improve it by better integration between parallel and sequential 
mechanisms.

> MDC. Implement parallel ping of DC2's nodes with the connection recovery.
> -------------------------------------------------------------------------
>
>                 Key: IGNITE-27746
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27746
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Vladimir Steshin
>            Priority: Major
>              Labels: IEP-140, ise
>             Fix For: 2.19
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Consider:
>  * The Multy-DC feature is on.
>  * A corner node from DC1 can't send a message to it's next node in DC2.
>  * DC2 is unavailable.
>  * No node of DC1 can connect to any node in DC2.
> To prevent sequential nodes failure in DC1 we need to extend the connection 
> recovery mechanics. We need to know whether DC2 is completely unavailable. If 
> so, we switch to DC/brain split but keep nodes of DC1 online. To achive this 
> we might ping DC2's nodes from the edge node while it does the normal 
> connection recovery under the same connection recovery timeout. If the 
> recovery fails and no ping to DC2 is success, we consider DC1 to work 
> separatelly from DC2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-27746) MDC. Implement parallel ping of DC2's nodes with the connection recovery.

Reply via email to