[ 
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=931212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-931212
 ]

ASF GitHub Bot logged work on ARTEMIS-4305:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Aug/24 18:01
            Start Date: 21/Aug/24 18:01
    Worklog Time Spent: 10m 
      Work Description: jbertram commented on PR #4899:
URL: 
https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2302663039

   > I am not sure I understand why should reconnect-attempts=0 be considered 
better in our case.
   
   It's better because it actually fits your use-case. Since persistence is 
disabled then _every_ time a broker restarts it will have a new identity. This 
is true even if you weren't on K8s. Since the broker's identity will change 
that means the cluster-connection will also need to be torn down and recreated 
so there's no reason to attempt to reconnect. Reconnecting a cluster connection 
only makes sense if the _same exact_ server is coming back.
   
   With `reconnect-attempts` > 0 and your fix from this PR then when a node 
leaves the cluster and gets recreated with a new identity using the same IP 
address the cluster-connection will reconnect only to then be destroyed when 
its discovered that the node's identity has changed. It would make more sense 
to simply _not_ reconnect (i.e. using `0` for `reconnect-attempts`) and allow 
the node to join the cluster as any other new node would. This approach follows 
the current design and intended function of the cluster-connection.
   
   > All these are supposed to be valid configurations, right?
   
   In your case I would say the configuration is not valid. To be clear, not 
all configurations are valid for all use-cases. For example, if your use-case 
required messages to survive a broker restart then setting 
`persistence-enabled` to `false` would be technically possible, but it would 
not be valid.
   
   > Even if our current configuration is not the "best" one by some criteria, 
why should this bug remain in the code?
   
   It's not that your current configuration is not the "best." It's that your 
current configuration is directly leading to the problem with the stale 
`MessageFlowRecordImpl` and that could be fixed simply by changing your 
configuration.
   
   When evaluating which changes should be merged into the code-base one must 
weigh the risk of the change (e.g. additional complexity, possibility for 
unintended consequences, performance impact, etc.) against the benefit. In this 
case I do not believe the benefits outweigh the risks given the fact that 
you're not using a valid configuration for your use-case. Ultimately I'm not 
sure I'd actually consider the current behavior a bug given that a valid 
configuration solves the issue.
   
   That said, there may, in fact, be another bug that you hit with the valid 
configuration. I expect this bug would impact other users and would be worth 
investigating and resolving. 
   
   I empathize with you about the lost work/time regarding this PR. I know the 
feeling because I've lost lots of work myself when my own PRs were rejected. 
Ideally this is all part of the process to make the software better for 
everyone in the end. 
   
   Of course, you are free to maintain a fork of ActiveMQ Artemis and apply 
your own patches (e.g. this one) before deploying in your environment. That's 
one of the many great benefits of open source.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 931212)
    Time Spent: 2h 40m  (was: 2.5h)

> Zero persistence does not work in kubernetes
> --------------------------------------------
>
>                 Key: ARTEMIS-4305
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Ivan Iliev
>            Priority: Major
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates 
> the process and shuts down the network before the process has a chance to 
> close connections. Then a new node might be brought up, reusing the old 
> node’s ip. If this happens before the connection ttl, from artemis’ point of 
> view, it looks like as if the connection came back. Yet it is actually not 
> the same, the peer has a new node id, etc. This messes things up with the 
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used 
> to detect dead connections could use some sort of connection id to match that 
> the other side is really the one which it is supposed to be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to