[
https://issues.apache.org/jira/browse/IGNITE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Evgeny Stanilovsky updated IGNITE-18772:
----------------------------------------
Fix Version/s: 3.2
(was: 3.1)
> Design mechanisms for messaging consistency
> -------------------------------------------
>
> Key: IGNITE-18772
> URL: https://issues.apache.org/jira/browse/IGNITE-18772
> Project: Ignite
> Issue Type: New Feature
> Components: networking
> Reporter: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
> Fix For: 3.2
>
>
> We have a use case where node A asks node B to notify node A when some event
> on node B occurs. This requires two round trips: first RT (A invokes B)
> installs an event listener on B, and second round trip (B makes a strong send
> to A) notifies A about the event.
> To account for possible topology instability, code at node A subscribes to
> onDisappeared(B), same does code at node B (but to onDisappeared(A)).
> A timeout might be installed on invocation future on node A.
> Outcomes are:
> # If B is not in the topology for A, invoke future fails right away (B knows
> nothing about invocation, there is no request)
> # If A loses B from sight before invoke response is delivered to A, invoke
> future fails at A, and B eventually deregisters the listener
> # If invocation is ok, but nodes lose each other from sight before the event
> happens, node A stops waiting and node B deregisters the listener
> # If invocation is ok and event happens while nodes see each other, callback
> is delivered from B to A (with best effort guarantees, with retries till
> delivered or timed out or nodes lose each other of sight)
> The outcome must be consistent between nodes A and B. That is, it cannot
> happen that one node acted as if it thought that another node disappeared,
> but another node acted as if first node was available.
> # Relation 'X sees Y' must be symmetric (in an eventual sense)
> # If node X currently does not see node Y, it cannot accept messages from it
> We could use the following invariant: if a node has disappeared from the
> topology, it cannot appear there again with same identity (IGNITE-18712 might
> help on the physical topology level).
> Things that should be carefully considered:
> # Nodes might have different views of the topology: 'X sees Y' might not be
> symmetric at some points in time
> # Messaging with the described consistency guarantees might be useful both
> over physical and logical topologies. Probably we need a way to abstract out
> a 'topology' by an abstraction that allows to check whether a node is visible
> or not, and subscribe to its joined/left events?
> # How do we deal with non-transient failures (like an NPE) different from
> failures caused by node disappearance? Do we just keep retrying until timeout
> is triggered, or we crash the node if some unexpected failure occurs, or...?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)