[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

ASF GitHub Bot (Jira) Tue, 20 Aug 2024 13:47:05 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=931028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-931028
 ]


ASF GitHub Bot logged work on ARTEMIS-4305:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Aug/24 20:46
            Start Date: 20/Aug/24 20:46
    Worklog Time Spent: 10m 
      Work Description: jbertram commented on PR #4899:
URL: 
https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2299734904

   I added your test to the branch with my fix, and I can see my fix detecting 
a problem and closing the connection, but the test still fails, and I still see 
messages like this:
   ```
   WARN  [org.apache.activemq.artemis.core.server] AMQ222139: 
MessageFlowRecordImpl [nodeID=13207315-5f2f-11ef-b63b-5c80b6f32172, 
connector=TransportConfiguration(name=netty-connector, 
factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?port=61616&host=localhost,
 
queueName=$.artemis.internal.sf.my-cluster.13207315-5f2f-11ef-b63b-5c80b6f32172,
 
queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.13207315-5f2f-11ef-b63b-5c80b6f32172,
 postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=localhost], 
temp=false]@3ca984c7, isClosed=false, reset=true]::Remote queue binding 
exampleQueue2dc45dca-5f2f-11ef-b5ea-5c80b6f32172 has already been bound in the 
post office. Most likely cause for this is you have a loop in your cluster due 
to cluster max-hops being too large or you have multiple cluster connections to 
the same nodes using overlapping addresses
   ```
   Is this the kind of message you see in your K8s cluster when this problem 
occurs and is that what you were referring to in the Jira when you said this?
   
   > This messes things up with the cluster, the old message flow record is 
invalid.
   
   I reproduced this with a very simple manual test with 2 clustered nodes with 
persistence disabled. When I kill one node and restart it I see the `AMQ222139` 
message on the _other_ node. However, I resolved this by simply changing the 
configuration on the `cluster-connection` using:
   ```
   <reconnect-attempts>0</reconnect-attemtps>
   ```
   I then cherry-picked your `ZeroPersistenceSymmetricalClusterTest` test to 
the `main` branch. The test fails by default, but when I change the various 
`broker.xml` files used by that test to use `0` `reconnect-attempts` the test 
passes. Also, given the fact that persistence is disabled this is the 
configuration I would recommend. Have you considered this configuration change 
in your environment? It seems this would resolve your problem with no code 
changes necessary.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 931028)
    Time Spent: 2h  (was: 1h 50m)

> Zero persistence does not work in kubernetes
> --------------------------------------------
>
>                 Key: ARTEMIS-4305
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Ivan Iliev
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact

[jira] [Work logged] (ARTEMIS-4305) Zero persistence does not work in kubernetes

Reply via email to