[
https://issues.apache.org/jira/browse/AMQ-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925950#comment-17925950
]
Ephemeris Lappis edited comment on AMQ-9653 at 2/11/25 11:13 AM:
-----------------------------------------------------------------
New elements about our issue : as our use case is probably too much complex to
be analysed as is, I've tried to reproduce the problem with a simpler example
that is more or less based on the same context. This example is a single Camel
blueprint with :
* 1 to N JMS consumers that wait for messages on their own queue
* All these consumers, after checking the received messages, forward them
through another queue
* This queue is consumed by another global consumer that also checks messages
forwarded by the N previous routes
* A producer that sends messages into the N queues : this is the test injector.
The message checking is based on a header that is an incremented number and
another counter that is the expected value. When the two values mismatch, it
means that messages are not received in the order they have been sent.
The message payload is randomly set from 1Kb to 4Mb to be as similar as
possible to our business case.
As all the messages are sent and consumed very fast, no issue occurs as is :
I've run it during about 12 hours without any error.
If I stop the producer first, to end with all the sent messages, stop the
global consumer, and then restart the producer and let it run some time, the
global queue size logically increases. When I finally restart the global
consumer, it starts consuming the pending message, but very quickly the message
order is broken.
The 3 error messages (until the producer route is really stopped) :
|*2025-02-11 09:37:16.094*|alice-echanges-karaf-7dcd965495-5ptf4|
|2025-02-11T09:37:15,471 \| ERROR \| Camel (bp3) thread #300 -
JmsConsumer[_TEST.xx] \| bp3 \| 155 - org.apache.camel.camel-core-reifier -
3.22.1 \| \| Bad message number on bp3-jms-consumer-x from
bp3-jms-consumer-08#1547: expected=15463 received=15468|
|*2025-02-11 09:37:10.066*|alice-echanges-karaf-7dcd965495-5ptf4|
|2025-02-11T09:37:09,003 \| ERROR \| Camel (bp3) thread #300 -
JmsConsumer[_TEST.xx] \| bp3 \| 155 - org.apache.camel.camel-core-reifier -
3.22.1 \| \| Bad message number on bp3-jms-consumer-x from
bp3-jms-consumer-04#1547: expected=15462 received=15464|
|*2025-02-11 09:36:49.018*|alice-echanges-karaf-7dcd965495-5ptf4|
|2025-02-11T09:36:46,470 \| ERROR \| Camel (bp3) thread #300 -
JmsConsumer[_TEST.xx] \| bp3 \| 155 - org.apache.camel.camel-core-reifier -
3.22.1 \| \| Bad message number on bp3-jms-consumer-x from
bp3-jms-consumer-02#1547: expected=15461 received=15462|
The test blueprint BP3.xml that I execute on Karaf is attached.
was (Author: ephemeris):
New elements about our issue : as our use case is probably too much complex to
be analysed as is, I've tried to reproduce the problem with a simpler example
that is more or less based on the same context. This example is a single Camel
blueprint with :
* 1 to N JMS consumers that wait for messages on their own queue
* All these consumers, after checking the received messages, forward them
through another queue
* This queue is consumed by another global consumer that also checks messages
forwarded by the N previous routes
* A producer that sends messages into the N queues : this is the test injector.
The message checking is based on a header that is an incremented number and
another counter that is the expected value. When the two values mismatch, it
means that messages are not received in the order they have been sent.
The message payload is randomly set from 1Kb to 4Mb to be as similar as
possible to our business case.
As all the messages are sent and consumed very fast, no issue occurs as is :
I've run it during about 12 hours without any error.
If I stop the producer first, to end with all the sent messages, stop the
global consumer, and then restart the producer and let it run some time, the
global queue size logically increases. When I finally restart the global
consumer, it starts consuming the pending message, but very quickly the message
order is broken.
The test blueprint BP3.xml that I execute on Karaf is attached.
> ActiveMQ doesn't process and deliver messages into/from queues in right order
> -----------------------------------------------------------------------------
>
> Key: AMQ-9653
> URL: https://issues.apache.org/jira/browse/AMQ-9653
> Project: ActiveMQ Classic
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.18.3
> Environment: ActiveMQ is executed as a pod in Kubernetes clusters,
> with a OpenJDK 64-Bit Server VM version 17.0.9+9 (Eclipse Adoptium).
> The persistance DB is on a VM (not a container)
> The Karaf has the same configuration in the same cluster namespace.
> All in Linux images 5.15.0-116-generic #126-Ubuntu
> Reporter: Ephemeris Lappis
> Priority: Major
> Attachments: BP3.xml
>
>
> After some load tests we can see random cases of messages that are delivered
> in an unexpected order. I our case this breaks the application for which the
> processing order is mandatory.
> FYI, we use a JDBC (PostgreSQL) persistance, although it may not be directly
> concerned.
> We've set some hacks to try to analyze where and when this occurs. First of
> them is a trigger on the "activemq_msgs" table to insert all the messages in
> a mirror table to keep them. Then we've substituted the application's queue
> consumer by a small Camel route that consumes messages and checks their
> logical orders. When a message arrives in a wrong sequence, a log is written
> in Karaf, and the message is copied into another backup queue. We also have
> the logs of the Camel routes that produce the messages that gives the origin
> chronology.
> Here an example where 5 messages constitute the "conversation". The table
> below (I put it at the end of the text) is built from the message mirror
> table. Its columns are :
> - the "id" (primary key) generated by ActiveMQ : this gives a first idea
> about message ordering.
> - the "ie_id" is our business identifier for a given batch of messages.
> - the "correlation_id" has its name says, a ID that may correlate different
> conversations, and is shared all over the application system.
> - the "etape", is the business step. I'll give more details about that later.
> - "date_traitement" is the current date that is set by the producer just
> before the message is sent.
> - "jms_timestamp_ms" and "jms_timestamp" are the JMSTimestamp of the message.
> - the "commit_time" is the pgsql transaction time of the inserted row.
> The logical order of our steps in almost all cases of conversation is :
> - 1 "initier_echange"
> - 1 "initier_utr" for each sub-batch (Logical transactional unit – TU)
> - 0 or more "diffuser_donnees_utr" for each TU (one for every generated
> data).
> - 1 "diffuser_utr" that closes the sub-batch.
> Our most common use cases manage only one UT with 1 or 2 data.
> According to the table, we can see that the "diffuser_utr" has been processed
> by ActiveMQ before the last "diffuser_donnees_utr". This last message seems
> to have been temporarly lost for a while : it's been persisted, and then
> delivered a very long time after it's been sent. See that the
> "date_traitement" is right, but both the "id" of the row and its commit time
> do not match. The JMS timestamps also seem inconsistent, first comparing with
> the "date_traitement", and then with a same value for the 4 last messages,
> even if the last one has a quite delay before it's been inserted and
> dispatched. Finally, the last message has been received by the consumer at
> 2025-01-23T15:45:39,236, thus the thrown error.
> This is just an example. In some other cases the very first message is
> delivered after all the others.
> On the broker side, no log about any strange behavior. The system and JVM
> monitoring do not show any resource lack issue.
> FYI, our load tests are done with a JMeter injector that set a 20 threads
> group, each one sending messages to 16 queues with loops from 10 to 100. The
> 16 queues are processed by 16 Camel routes that indirectly produce 4 or 5
> messages into the "buggy queue" for every received input. In the application,
> a single application with a single consumer processes them. The same in our
> testing Camel consumer.
> As the broker is some kind of black box for us, we can't give more
> information about how it manages this kind of load test that is more or less
> what could happen in a real rush situation.
> Of course I can provide us with any detail you could ask for. I've not found
> any interesting case googling for similar issues, but perhaps someone already
> had some.
> {{Thanks in advance.}}
> {noformat}
> |id |ie_id |correlation_id
> |etape |date_traitement
> |jms_timestamp_ms|jms_timestamp |commit_time |
> |-----|----------------------------------|----------------------------------|------------------|----------------------|--------------|---------------------------|----------------------------|
> |8 124
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_echange
> |2025-01-23T14:45:15.963Z|1737643516108 |2025-01-23 15:45:16.108
> +0100|2025-01-23 15:45:15.964 +0100|
> |8 142
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_utr
> |2025-01-23T14:45:16.003Z|1737643515969 |2025-01-23 15:45:15.969
> +0100|2025-01-23 15:45:16.005 +0100|
> |8 165
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
> |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:16.064 +0100|
> |8 166
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_utr
> |2025-01-23T14:45:16.067Z|1737643515969 |2025-01-23 15:45:15.969
> +0100|2025-01-23 15:45:16.069 +0100|
> |18
> 982|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
> |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:39.265 +0100|{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact