[
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=915875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-915875
]
ASF GitHub Bot logged work on ARTEMIS-4305:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 22/Apr/24 16:58
Start Date: 22/Apr/24 16:58
Worklog Time Spent: 10m
Work Description: iiliev2 opened a new pull request, #4899:
URL: https://github.com/apache/activemq-artemis/pull/4899
In a cluster deployed in kubernetes, when a node is destroyed it terminates
the process and shuts down the network before the process has a chance to close
connections. Then a new node might be brought up, reusing the old node’s ip. If
this happens before the connection ttl, from artemis’ point of view, it looks
like as if the connection came back. Yet it is actually not the same, the peer
has a new node id, etc. This messes things up with the cluster, the old message
flow record is invalid.
This also solves another similar issue - if a node goes down and a new one
comes in with a new nodeUUID and the same IP before the cluster connections in
the others timeout, it would cause them to get stuck and list both the old and
the new nodes in their topologies.
The changes are grouped in tightly related incremental commits to make it
easier to understand what is changed:
1. `Ping` packets include `nodeUUID`
2. Acceptors and connectors carry `TransportConfiguration`
3. `RemotingConnectionImpl#doBufferReceived` tracks for ping nodeUUID
mismatch with the target to flag it as `unhealthy`; `ClientSessionFactoryImpl`
destroys unhealthy connections(in addition to not receiving any data on time)
Issue Time Tracking
-------------------
Worklog Id: (was: 915875)
Remaining Estimate: 0h
Time Spent: 10m
> Zero persistence does not work in kubernetes
> --------------------------------------------
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Reporter: Ivan Iliev
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates
> the process and shuts down the network before the process has a chance to
> close connections. Then a new node might be brought up, reusing the old
> node’s ip. If this happens before the connection ttl, from artemis’ point of
> view, it looks like as if the connection came back. Yet it is actually not
> the same, the peer has a new node id, etc. This messes things up with the
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used
> to detect dead connections could use some sort of connection id to match that
> the other side is really the one which it is supposed to be.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)