[jira] [Updated] (IGNITE-18772) Design reactive network messaging

Roman Puchkovskiy (Jira) Fri, 10 Feb 2023 07:16:55 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Puchkovskiy updated IGNITE-18772:
---------------------------------------
    Description: 
We have a use case where node A asks node B to notify node A when some event on 
node B occurs. This requires two round trips: first RT (A invokes B) installs 
an event listener on B, and second round trip (B makes a strong send to A) 
notifies A about the event.

To account for possible topology instability, code at node A subscribes to 
onDisappeared(B), same does code at node B (but to onDisappeared(A)).

A timeout might be installed on node A.

Outcomes are:
 # If B is not in the topology for A, invoke future fails right away (B knows 
nothing about invocation, there is no request)
 # If A loses B from sight before invoke response is delivered to A, invoke 
future fails at A, and B eventually deregisters the listener
 # If invocation is ok, but nodes lose each other from sight before the event 
happens, node A stops waiting and node B deregisters the listener
 # Same happens if callback times out
 # If invocation is ok and event happens while nodes see each other, callback 
is delivered from B to A (with best effort guarantees, with retries till 
delivered or timed out or nodes lose each other of sight)

The outcome must be consistent. That is, it cannot happen that one node acted 
as if it thought that another node disappeared, but another node acted as if 
first node was available.
 # Relation 'X sees Y' must be symmetric (in an eventual sense)
 # If node X currently does not see node Y, it cannot accept messages from it

We could use the following invariant: if a node has disappeared from the 
topology, it cannot appear there again with same identity (IGNITE-18712 might 
help on the physical topology level).

Things that should be carefully considered:
 # Nodes might have different views of the topology: 'X sees Y' might not be 
symmetric at some points in time
 # What kinds of topologies are concerned? Should this work over physical 
topology, logical one, or over any of them (on the user's discretion)?
 # How do we deal with non-transient failures (like an NPE) different from 
failures caused by node disappearance? Do we just keep retrying until timeout 
is triggered, or we crash the node if some unexpected failure occurs, or...?

  was:
Let's suppose node A needs to send a message to node B.
 # Future returned by the call (on node A) will either complete successfully 
(if the message was delivered), or it will complete exceptionally (if node B 
disappeared from the topology)
 # Node B will only return the response if node A is still in the topology
 # All transient failures are retried automatically until the message is 
delivered OR the counterpart disappeared from the topology OR the future has 
been cancelled by the user. The retries are made under the hood.
 # As mentioned above, the user can cancel the delivery of the message

This could use the following invariant: if a node has disappeared from the 
topology, it cannot appear there again with same identity (IGNITE-18712 might 
help on the physical topology level).

Things that should be carefully considered:
 # Nodes might have different views of the topology: 'X sees Y' might not be 
symmetric
 # What kinds of topologies are concerned? Should this work over physical 
topology, logical one, or over any of them (on the user's discretion)?
 # Should we return a {{Publisher<T>}} (or its specialization like 
{{{}Mono<T>{}}}) instead of a {{CompletableFuture<T>}} (given that we want 
something reactive)?
 # How do we deal with non-transient failures (like an NPE) different from 
failures caused by node disappearance? Do we just keep retrying until timeout 
is triggered, or we crash the node if some unexpected failure occurs, or...?


> Design reactive network messaging
> ---------------------------------
>
>                 Key: IGNITE-18772
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18772
>             Project: Ignite
>          Issue Type: New Feature
>          Components: networking
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> We have a use case where node A asks node B to notify node A when some event 
> on node B occurs. This requires two round trips: first RT (A invokes B) 
> installs an event listener on B, and second round trip (B makes a strong send 
> to A) notifies A about the event.
> To account for possible topology instability, code at node A subscribes to 
> onDisappeared(B), same does code at node B (but to onDisappeared(A)).
> A timeout might be installed on node A.
> Outcomes are:
>  # If B is not in the topology for A, invoke future fails right away (B knows 
> nothing about invocation, there is no request)
>  # If A loses B from sight before invoke response is delivered to A, invoke 
> future fails at A, and B eventually deregisters the listener
>  # If invocation is ok, but nodes lose each other from sight before the event 
> happens, node A stops waiting and node B deregisters the listener
>  # Same happens if callback times out
>  # If invocation is ok and event happens while nodes see each other, callback 
> is delivered from B to A (with best effort guarantees, with retries till 
> delivered or timed out or nodes lose each other of sight)
> The outcome must be consistent. That is, it cannot happen that one node acted 
> as if it thought that another node disappeared, but another node acted as if 
> first node was available.
>  # Relation 'X sees Y' must be symmetric (in an eventual sense)
>  # If node X currently does not see node Y, it cannot accept messages from it
> We could use the following invariant: if a node has disappeared from the 
> topology, it cannot appear there again with same identity (IGNITE-18712 might 
> help on the physical topology level).
> Things that should be carefully considered:
>  # Nodes might have different views of the topology: 'X sees Y' might not be 
> symmetric at some points in time
>  # What kinds of topologies are concerned? Should this work over physical 
> topology, logical one, or over any of them (on the user's discretion)?
>  # How do we deal with non-transient failures (like an NPE) different from 
> failures caused by node disappearance? Do we just keep retrying until timeout 
> is triggered, or we crash the node if some unexpected failure occurs, or...?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-18772) Design reactive network messaging

Reply via email to