[
https://issues.apache.org/jira/browse/AMQ-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922300#comment-17922300
]
Ephemeris Lappis commented on AMQ-9653:
---------------------------------------
Another important detail : the total count of delivered (consumed) messages
from our tracing queue matches exactly both the expected count, according to
our test data, what is in our logs and the rows count that are intercepted by
the trigger on the message table.
This means that if some messages are delivered twice, with an inconsistent
order, other messages ar not processed at all : not intercepted by my trigger
on the message table, and never dispatched to the consumer, of course.
> ActiveMQ doesn't process and deliver messages into/from queues in right order
> -----------------------------------------------------------------------------
>
> Key: AMQ-9653
> URL: https://issues.apache.org/jira/browse/AMQ-9653
> Project: ActiveMQ Classic
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.18.3
> Environment: ActiveMQ is executed as a pod in Kubernetes clusters,
> with a OpenJDK 64-Bit Server VM version 17.0.9+9 (Eclipse Adoptium).
> The persistance DB is on a VM (not a container)
> The Karaf has the same configuration in the same cluster namespace.
> All in Linux images 5.15.0-116-generic #126-Ubuntu
> Reporter: Ephemeris Lappis
> Priority: Major
>
> After some load tests we can see random cases of messages that are delivered
> in an unexpected order. I our case this breaks the application for which the
> processing order is mandatory.
> FYI, we use a JDBC (PostgreSQL) persistance, although it may not be directly
> concerned.
> We've set some hacks to try to analyze where and when this occurs. First of
> them is a trigger on the "activemq_msgs" table to insert all the messages in
> a mirror table to keep them. Then we've substituted the application's queue
> consumer by a small Camel route that consumes messages and checks their
> logical orders. When a message arrives in a wrong sequence, a log is written
> in Karaf, and the message is copied into another backup queue. We also have
> the logs of the Camel routes that produce the messages that gives the origin
> chronology.
> Here an example where 5 messages constitute the "conversation". The table
> below (I put it at the end of the text) is built from the message mirror
> table. Its columns are :
> - the "id" (primary key) generated by ActiveMQ : this gives a first idea
> about message ordering.
> - the "ie_id" is our business identifier for a given batch of messages.
> - the "correlation_id" has its name says, a ID that may correlate different
> conversations, and is shared all over the application system.
> - the "etape", is the business step. I'll give more details about that later.
> - "date_traitement" is the current date that is set by the producer just
> before the message is sent.
> - "jms_timestamp_ms" and "jms_timestamp" are the JMSTimestamp of the message.
> - the "commit_time" is the pgsql transaction time of the inserted row.
> The logical order of our steps in almost all cases of conversation is :
> - 1 "initier_echange"
> - 1 "initier_utr" for each sub-batch (Logical transactional unit – TU)
> - 0 or more "diffuser_donnees_utr" for each TU (one for every generated
> data).
> - 1 "diffuser_utr" that closes the sub-batch.
> Our most common use cases manage only one UT with 1 or 2 data.
> According to the table, we can see that the "diffuser_utr" has been processed
> by ActiveMQ before the last "diffuser_donnees_utr". This last message seems
> to have been temporarly lost for a while : it's been persisted, and then
> delivered a very long time after it's been sent. See that the
> "date_traitement" is right, but both the "id" of the row and its commit time
> do not match. The JMS timestamps also seem inconsistent, first comparing with
> the "date_traitement", and then with a same value for the 4 last messages,
> even if the last one has a quite delay before it's been inserted and
> dispatched. Finally, the last message has been received by the consumer at
> 2025-01-23T15:45:39,236, thus the thrown error.
> This is just an example. In some other cases the very first message is
> delivered after all the others.
> On the broker side, no log about any strange behavior. The system and JVM
> monitoring do not show any resource lack issue.
> FYI, our load tests are done with a JMeter injector that set a 20 threads
> group, each one sending messages to 16 queues with loops from 10 to 100. The
> 16 queues are processed by 16 Camel routes that indirectly produce 4 or 5
> messages into the "buggy queue" for every received input. In the application,
> a single application with a single consumer processes them. The same in our
> testing Camel consumer.
> As the broker is some kind of black box for us, we can't give more
> information about how it manages this kind of load test that is more or less
> what could happen in a real rush situation.
> Of course I can provide us with any detail you could ask for. I've not found
> any interesting case googling for similar issues, but perhaps someone already
> had some.
> {{Thanks in advance.}}
> {noformat}
> |id |ie_id |correlation_id
> |etape |date_traitement
> |jms_timestamp_ms|jms_timestamp |commit_time |
> |-----|----------------------------------|----------------------------------|------------------|----------------------|--------------|---------------------------|----------------------------|
> |8 124
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_echange
> |2025-01-23T14:45:15.963Z|1737643516108 |2025-01-23 15:45:16.108
> +0100|2025-01-23 15:45:15.964 +0100|
> |8 142
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_utr
> |2025-01-23T14:45:16.003Z|1737643515969 |2025-01-23 15:45:15.969
> +0100|2025-01-23 15:45:16.005 +0100|
> |8 165
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
> |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:16.064 +0100|
> |8 166
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_utr
> |2025-01-23T14:45:16.067Z|1737643515969 |2025-01-23 15:45:15.969
> +0100|2025-01-23 15:45:16.069 +0100|
> |18
> 982|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
> |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:39.265 +0100|{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact