[
https://issues.apache.org/jira/browse/AMQ-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923626#comment-17923626
]
Ephemeris Lappis commented on AMQ-9653:
---------------------------------------
After new tests we've checked all the messages that have been persisted into
the table 'activemq_msgs'.
The tests was based on 136000 Camel exchanges that have generated 560000
messages into the tracing queue. Our trigger has saved all these messages into
a mirror table. A Camel test consumer has received exactly 560000 messages,
among them 15 were delivered twice or in bad order. As aid before, this means
that some other messages have not been delivered at all.
Reading the 560000 messages from the mirror table, we've checked the logical
integrity of the 136000 exchanges steps. Thus, data in the messages table are
all correct, and the trouble is only on the dispatching process.
I hope all those comments may help to explain what happens, and find a
solution. Is there a configuration parameter, for example, that may impact the
messages dispatching ?
> ActiveMQ doesn't process and deliver messages into/from queues in right order
> -----------------------------------------------------------------------------
>
> Key: AMQ-9653
> URL: https://issues.apache.org/jira/browse/AMQ-9653
> Project: ActiveMQ Classic
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.18.3
> Environment: ActiveMQ is executed as a pod in Kubernetes clusters,
> with a OpenJDK 64-Bit Server VM version 17.0.9+9 (Eclipse Adoptium).
> The persistance DB is on a VM (not a container)
> The Karaf has the same configuration in the same cluster namespace.
> All in Linux images 5.15.0-116-generic #126-Ubuntu
> Reporter: Ephemeris Lappis
> Priority: Major
>
> After some load tests we can see random cases of messages that are delivered
> in an unexpected order. I our case this breaks the application for which the
> processing order is mandatory.
> FYI, we use a JDBC (PostgreSQL) persistance, although it may not be directly
> concerned.
> We've set some hacks to try to analyze where and when this occurs. First of
> them is a trigger on the "activemq_msgs" table to insert all the messages in
> a mirror table to keep them. Then we've substituted the application's queue
> consumer by a small Camel route that consumes messages and checks their
> logical orders. When a message arrives in a wrong sequence, a log is written
> in Karaf, and the message is copied into another backup queue. We also have
> the logs of the Camel routes that produce the messages that gives the origin
> chronology.
> Here an example where 5 messages constitute the "conversation". The table
> below (I put it at the end of the text) is built from the message mirror
> table. Its columns are :
> - the "id" (primary key) generated by ActiveMQ : this gives a first idea
> about message ordering.
> - the "ie_id" is our business identifier for a given batch of messages.
> - the "correlation_id" has its name says, a ID that may correlate different
> conversations, and is shared all over the application system.
> - the "etape", is the business step. I'll give more details about that later.
> - "date_traitement" is the current date that is set by the producer just
> before the message is sent.
> - "jms_timestamp_ms" and "jms_timestamp" are the JMSTimestamp of the message.
> - the "commit_time" is the pgsql transaction time of the inserted row.
> The logical order of our steps in almost all cases of conversation is :
> - 1 "initier_echange"
> - 1 "initier_utr" for each sub-batch (Logical transactional unit – TU)
> - 0 or more "diffuser_donnees_utr" for each TU (one for every generated
> data).
> - 1 "diffuser_utr" that closes the sub-batch.
> Our most common use cases manage only one UT with 1 or 2 data.
> According to the table, we can see that the "diffuser_utr" has been processed
> by ActiveMQ before the last "diffuser_donnees_utr". This last message seems
> to have been temporarly lost for a while : it's been persisted, and then
> delivered a very long time after it's been sent. See that the
> "date_traitement" is right, but both the "id" of the row and its commit time
> do not match. The JMS timestamps also seem inconsistent, first comparing with
> the "date_traitement", and then with a same value for the 4 last messages,
> even if the last one has a quite delay before it's been inserted and
> dispatched. Finally, the last message has been received by the consumer at
> 2025-01-23T15:45:39,236, thus the thrown error.
> This is just an example. In some other cases the very first message is
> delivered after all the others.
> On the broker side, no log about any strange behavior. The system and JVM
> monitoring do not show any resource lack issue.
> FYI, our load tests are done with a JMeter injector that set a 20 threads
> group, each one sending messages to 16 queues with loops from 10 to 100. The
> 16 queues are processed by 16 Camel routes that indirectly produce 4 or 5
> messages into the "buggy queue" for every received input. In the application,
> a single application with a single consumer processes them. The same in our
> testing Camel consumer.
> As the broker is some kind of black box for us, we can't give more
> information about how it manages this kind of load test that is more or less
> what could happen in a real rush situation.
> Of course I can provide us with any detail you could ask for. I've not found
> any interesting case googling for similar issues, but perhaps someone already
> had some.
> {{Thanks in advance.}}
> {noformat}
> |id |ie_id |correlation_id
> |etape |date_traitement
> |jms_timestamp_ms|jms_timestamp |commit_time |
> |-----|----------------------------------|----------------------------------|------------------|----------------------|--------------|---------------------------|----------------------------|
> |8 124
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_echange
> |2025-01-23T14:45:15.963Z|1737643516108 |2025-01-23 15:45:16.108
> +0100|2025-01-23 15:45:15.964 +0100|
> |8 142
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_utr
> |2025-01-23T14:45:16.003Z|1737643515969 |2025-01-23 15:45:15.969
> +0100|2025-01-23 15:45:16.005 +0100|
> |8 165
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
> |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:16.064 +0100|
> |8 166
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_utr
> |2025-01-23T14:45:16.067Z|1737643515969 |2025-01-23 15:45:15.969
> +0100|2025-01-23 15:45:16.069 +0100|
> |18
> 982|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
> |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:39.265 +0100|{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact