[
https://issues.apache.org/jira/browse/ARTEMIS-4305?focusedWorklogId=931212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-931212
]
ASF GitHub Bot logged work on ARTEMIS-4305:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 21/Aug/24 18:01
Start Date: 21/Aug/24 18:01
Worklog Time Spent: 10m
Work Description: jbertram commented on PR #4899:
URL:
https://github.com/apache/activemq-artemis/pull/4899#issuecomment-2302663039
> I am not sure I understand why should reconnect-attempts=0 be considered
better in our case.
It's better because it actually fits your use-case. Since persistence is
disabled then _every_ time a broker restarts it will have a new identity. This
is true even if you weren't on K8s. Since the broker's identity will change
that means the cluster-connection will also need to be torn down and recreated
so there's no reason to attempt to reconnect. Reconnecting a cluster connection
only makes sense if the _same exact_ server is coming back.
With `reconnect-attempts` > 0 and your fix from this PR then when a node
leaves the cluster and gets recreated with a new identity using the same IP
address the cluster-connection will reconnect only to then be destroyed when
its discovered that the node's identity has changed. It would make more sense
to simply _not_ reconnect (i.e. using `0` for `reconnect-attempts`) and allow
the node to join the cluster as any other new node would. This approach follows
the current design and intended function of the cluster-connection.
> All these are supposed to be valid configurations, right?
In your case I would say the configuration is not valid. To be clear, not
all configurations are valid for all use-cases. For example, if your use-case
required messages to survive a broker restart then setting
`persistence-enabled` to `false` would be technically possible, but it would
not be valid.
> Even if our current configuration is not the "best" one by some criteria,
why should this bug remain in the code?
It's not that your current configuration is not the "best." It's that your
current configuration is directly leading to the problem with the stale
`MessageFlowRecordImpl` and that could be fixed simply by changing your
configuration.
When evaluating which changes should be merged into the code-base one must
weigh the risk of the change (e.g. additional complexity, possibility for
unintended consequences, performance impact, etc.) against the benefit. In this
case I do not believe the benefits outweigh the risks given the fact that
you're not using a valid configuration for your use-case. Ultimately I'm not
sure I'd actually consider the current behavior a bug given that a valid
configuration solves the issue.
That said, there may, in fact, be another bug that you hit with the valid
configuration. I expect this bug would impact other users and would be worth
investigating and resolving.
I empathize with you about the lost work/time regarding this PR. I know the
feeling because I've lost lots of work myself when my own PRs were rejected.
Ideally this is all part of the process to make the software better for
everyone in the end.
Of course, you are free to maintain a fork of ActiveMQ Artemis and apply
your own patches (e.g. this one) before deploying in your environment. That's
one of the many great benefits of open source.
Issue Time Tracking
-------------------
Worklog Id: (was: 931212)
Time Spent: 2h 40m (was: 2.5h)
> Zero persistence does not work in kubernetes
> --------------------------------------------
>
> Key: ARTEMIS-4305
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4305
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Reporter: Ivan Iliev
> Priority: Major
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> In a cluster deployed in kubernetes, when a node is destroyed it terminates
> the process and shuts down the network before the process has a chance to
> close connections. Then a new node might be brought up, reusing the old
> node’s ip. If this happens before the connection ttl, from artemis’ point of
> view, it looks like as if the connection came back. Yet it is actually not
> the same, the peer has a new node id, etc. This messes things up with the
> cluster, the old message flow record is invalid.
> One way to fix it could be if the {{Ping}} messages which are typically used
> to detect dead connections could use some sort of connection id to match that
> the other side is really the one which it is supposed to be.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact