[
https://issues.apache.org/jira/browse/IGNITE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-18772:
---------------------------------------
Description:
We have a use case where node A asks node B to notify node A when some event on
node B occurs. This requires two round trips: first RT (A invokes B) installs
an event listener on B, and second round trip (B makes a strong send to A)
notifies A about the event.
To account for possible topology instability, code at node A subscribes to
onDisappeared(B), same does code at node B (but to onDisappeared(A)).
A timeout might be installed on node A.
Outcomes are:
# If B is not in the topology for A, invoke future fails right away (B knows
nothing about invocation, there is no request)
# If A loses B from sight before invoke response is delivered to A, invoke
future fails at A, and B eventually deregisters the listener
# If invocation is ok, but nodes lose each other from sight before the event
happens, node A stops waiting and node B deregisters the listener
# Same happens if callback times out
# If invocation is ok and event happens while nodes see each other, callback
is delivered from B to A (with best effort guarantees, with retries till
delivered or timed out or nodes lose each other of sight)
The outcome must be consistent. That is, it cannot happen that one node acted
as if it thought that another node disappeared, but another node acted as if
first node was available.
# Relation 'X sees Y' must be symmetric (in an eventual sense)
# If node X currently does not see node Y, it cannot accept messages from it
We could use the following invariant: if a node has disappeared from the
topology, it cannot appear there again with same identity (IGNITE-18712 might
help on the physical topology level).
Things that should be carefully considered:
# Nodes might have different views of the topology: 'X sees Y' might not be
symmetric at some points in time
# What kinds of topologies are concerned? Should this work over physical
topology, logical one, or over any of them (on the user's discretion)?
# How do we deal with non-transient failures (like an NPE) different from
failures caused by node disappearance? Do we just keep retrying until timeout
is triggered, or we crash the node if some unexpected failure occurs, or...?
was:
Let's suppose node A needs to send a message to node B.
# Future returned by the call (on node A) will either complete successfully
(if the message was delivered), or it will complete exceptionally (if node B
disappeared from the topology)
# Node B will only return the response if node A is still in the topology
# All transient failures are retried automatically until the message is
delivered OR the counterpart disappeared from the topology OR the future has
been cancelled by the user. The retries are made under the hood.
# As mentioned above, the user can cancel the delivery of the message
This could use the following invariant: if a node has disappeared from the
topology, it cannot appear there again with same identity (IGNITE-18712 might
help on the physical topology level).
Things that should be carefully considered:
# Nodes might have different views of the topology: 'X sees Y' might not be
symmetric
# What kinds of topologies are concerned? Should this work over physical
topology, logical one, or over any of them (on the user's discretion)?
# Should we return a {{Publisher<T>}} (or its specialization like
{{{}Mono<T>{}}}) instead of a {{CompletableFuture<T>}} (given that we want
something reactive)?
# How do we deal with non-transient failures (like an NPE) different from
failures caused by node disappearance? Do we just keep retrying until timeout
is triggered, or we crash the node if some unexpected failure occurs, or...?
> Design reactive network messaging
> ---------------------------------
>
> Key: IGNITE-18772
> URL: https://issues.apache.org/jira/browse/IGNITE-18772
> Project: Ignite
> Issue Type: New Feature
> Components: networking
> Reporter: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> We have a use case where node A asks node B to notify node A when some event
> on node B occurs. This requires two round trips: first RT (A invokes B)
> installs an event listener on B, and second round trip (B makes a strong send
> to A) notifies A about the event.
> To account for possible topology instability, code at node A subscribes to
> onDisappeared(B), same does code at node B (but to onDisappeared(A)).
> A timeout might be installed on node A.
> Outcomes are:
> # If B is not in the topology for A, invoke future fails right away (B knows
> nothing about invocation, there is no request)
> # If A loses B from sight before invoke response is delivered to A, invoke
> future fails at A, and B eventually deregisters the listener
> # If invocation is ok, but nodes lose each other from sight before the event
> happens, node A stops waiting and node B deregisters the listener
> # Same happens if callback times out
> # If invocation is ok and event happens while nodes see each other, callback
> is delivered from B to A (with best effort guarantees, with retries till
> delivered or timed out or nodes lose each other of sight)
> The outcome must be consistent. That is, it cannot happen that one node acted
> as if it thought that another node disappeared, but another node acted as if
> first node was available.
> # Relation 'X sees Y' must be symmetric (in an eventual sense)
> # If node X currently does not see node Y, it cannot accept messages from it
> We could use the following invariant: if a node has disappeared from the
> topology, it cannot appear there again with same identity (IGNITE-18712 might
> help on the physical topology level).
> Things that should be carefully considered:
> # Nodes might have different views of the topology: 'X sees Y' might not be
> symmetric at some points in time
> # What kinds of topologies are concerned? Should this work over physical
> topology, logical one, or over any of them (on the user's discretion)?
> # How do we deal with non-transient failures (like an NPE) different from
> failures caused by node disappearance? Do we just keep retrying until timeout
> is triggered, or we crash the node if some unexpected failure occurs, or...?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)