[
https://issues.apache.org/jira/browse/IGNITE-18630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-18630:
---------------------------------------
Description:
Currently, there are two topologies: physical (bound to Scalecube events 1:1)
and logical. Appearing in the physical topology (PT) starts validation which
(if successful) ends with addition to the logical topology (LT); dropping from
the PT immediately removes a node from the LT.
We use PT as a set of nodes to which the current node can send messages. This
means that if ScaleCube loses a node from sight due to a transient glitch
(caused by a GC pause, for example), after which a node becomes visible again,
we still remove the node from the PT, making it impossible to deliver a message
to it; so transient network glitches harm the reliability of messaging.
The suggestion is to switch to the following:
# We decouple ScaleCube topology from the PT, so we now have 3 topologies:
ScaleCube topology (tracked via ScaleCube events) (these are nodes that are
thought to be alive by our node from the point of view of SWIM protocol),
physical topology (nodes which we consider as reachable and to which we can
send messages) and logical topology (nodes that passed validation and joined
the cluster)
# A node enters PT when it appears in the ScaleCube topology (ST), but it
leaves the PT when it leaves the LT
# Logical topology 'leave' events will be triggered by ST leave events, but
with a delay, so that if a node returns to the ST with same ScaleCube ID, LT
leave event is not fired
Summing up:
# When a node appears in ST, it appears in PT
# When it appears in PT, validation process starts (which might lead to adding
the node to LT)
# When a node leaves ST, a delayed removal from LT is scheduled. It is
cancelled if the node appears in ST again
# When a node leaves LT, it leaves PT (making it impossible to send a message
to it)
# When doing a graceful shutdown, a node should send a 'graceful LT leave'
message so that it drops from LT and PT immediately, without the timeout
defined in item 3.
# If a node is removed from LT, it can not be let to PT again with same ID (ID
is the 'launch ID', not the consistent ID); to enter, it must change its ID
As LT events are distributed using RAFT, if a node loses ability to connect a
CMG leader, it will never drop other nodes from its PT, so it will try to
deliver messages for infinite time. This seems ok.
One thing that should be considered is that {{TopologyService}} (for PT) and
{{LogicalTopologyService}} are defined in different modules, which might cause
difficulties when subscribing to each other events.
was:
Currently, there are two topologies: physical (bound to Scalecube events 1:1)
and logical. Appearing in the physical topology (PT) starts validation which
(if successful) ends with addition to the logical topology (LT); dropping from
the PT immediately removes a node from the LT.
We use PT as a set of nodes to which the current node can send messages. This
means that if ScaleCube loses a node from sight due to a transient glitch
(caused by a GC pause, for example), after which a node becomes visible again,
we still remove the node from the PT, making it impossible to deliver a message
to it; so transient network glitches harm the reliability of messaging.
The suggestion is to switch to the following:
# We decouple ScaleCube topology from the PT, so we now have 3 topologies:
ScaleCube topology (tracked via ScaleCube events) (these are nodes that are
thought to be alive by our node from the point of view of SWIM protocol),
physical topology (nodes which we consider as reachable and to which we can
send messages) and logical topology (nodes that passed validation and joined
the cluster)
# A node enters PT when it appears in the ScaleCube topology (ST), but it
leaves the PT when it leaves the LT
# Logical topology 'leave' events will be triggered by ST leave events, but
with a delay, so that if a node returns to the ST with same ScaleCube ID, LT
leave event is not fired
Summing up:
# When a node appears in ST, it appears in PT
# When it appears in PT, validation process starts (which might lead to adding
the node to LT)
# When a node leaves ST, a delayed removal from LT is scheduled. It is
cancelled if the node appears in ST again
# When a node leaves LT, it leaves PT (making it impossible to send a message
to it)
# When doing a graceful shutdown, a node should send a 'graceful LT leave'
message so that it drops from it LT and PT immediately, without the timeout
defined in item 3.
# If a node is removed from LT, it can not be let to PT again with same ID (ID
is the 'launch ID', not the consistent ID); to enter, it must change its ID
As LT events are distributed using RAFT, if a node loses ability to connect a
CMG leader, it will never drop other nodes from its PT, so it will try to
deliver messages for infinite time. This seems ok.
One thing that should be considered is that {{TopologyService}} (for PT) and
{{LogicalTopologyService}} are defined in different modules, which might cause
difficulties when subscribing to each other events.
> Try to deliver a message until receiver drops out from logical topology
> -----------------------------------------------------------------------
>
> Key: IGNITE-18630
> URL: https://issues.apache.org/jira/browse/IGNITE-18630
> Project: Ignite
> Issue Type: Improvement
> Components: networking
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently, there are two topologies: physical (bound to Scalecube events 1:1)
> and logical. Appearing in the physical topology (PT) starts validation which
> (if successful) ends with addition to the logical topology (LT); dropping
> from the PT immediately removes a node from the LT.
> We use PT as a set of nodes to which the current node can send messages. This
> means that if ScaleCube loses a node from sight due to a transient glitch
> (caused by a GC pause, for example), after which a node becomes visible
> again, we still remove the node from the PT, making it impossible to deliver
> a message to it; so transient network glitches harm the reliability of
> messaging.
> The suggestion is to switch to the following:
> # We decouple ScaleCube topology from the PT, so we now have 3 topologies:
> ScaleCube topology (tracked via ScaleCube events) (these are nodes that are
> thought to be alive by our node from the point of view of SWIM protocol),
> physical topology (nodes which we consider as reachable and to which we can
> send messages) and logical topology (nodes that passed validation and joined
> the cluster)
> # A node enters PT when it appears in the ScaleCube topology (ST), but it
> leaves the PT when it leaves the LT
> # Logical topology 'leave' events will be triggered by ST leave events, but
> with a delay, so that if a node returns to the ST with same ScaleCube ID, LT
> leave event is not fired
> Summing up:
> # When a node appears in ST, it appears in PT
> # When it appears in PT, validation process starts (which might lead to
> adding the node to LT)
> # When a node leaves ST, a delayed removal from LT is scheduled. It is
> cancelled if the node appears in ST again
> # When a node leaves LT, it leaves PT (making it impossible to send a
> message to it)
> # When doing a graceful shutdown, a node should send a 'graceful LT leave'
> message so that it drops from LT and PT immediately, without the timeout
> defined in item 3.
> # If a node is removed from LT, it can not be let to PT again with same ID
> (ID is the 'launch ID', not the consistent ID); to enter, it must change its
> ID
> As LT events are distributed using RAFT, if a node loses ability to connect a
> CMG leader, it will never drop other nodes from its PT, so it will try to
> deliver messages for infinite time. This seems ok.
> One thing that should be considered is that {{TopologyService}} (for PT) and
> {{LogicalTopologyService}} are defined in different modules, which might
> cause difficulties when subscribing to each other events.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)