[ 
https://issues.apache.org/jira/browse/AMQ-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ephemeris Lappis updated AMQ-9653:
----------------------------------
    Attachment: BP3.xml

> ActiveMQ doesn't process and deliver messages into/from queues in right order
> -----------------------------------------------------------------------------
>
>                 Key: AMQ-9653
>                 URL: https://issues.apache.org/jira/browse/AMQ-9653
>             Project: ActiveMQ Classic
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.18.3
>         Environment: ActiveMQ is executed as a pod in Kubernetes clusters, 
> with a OpenJDK 64-Bit Server VM version 17.0.9+9 (Eclipse Adoptium).
> The persistance DB is on a VM (not a container)
> The Karaf has the same configuration in the same cluster namespace.
> All in Linux images 5.15.0-116-generic #126-Ubuntu
>            Reporter: Ephemeris Lappis
>            Priority: Major
>         Attachments: BP3.xml
>
>
> After some load tests we can see random cases of messages that are delivered 
> in an unexpected order. I our case this breaks the application for which the 
> processing order is mandatory.
> FYI, we use a JDBC (PostgreSQL) persistance, although it may not be directly 
> concerned.
> We've set some hacks to try to analyze where and when this occurs. First of 
> them is a trigger on the "activemq_msgs" table to insert all the messages in 
> a mirror table to keep them. Then we've substituted the application's queue 
> consumer by a small Camel route that consumes messages and checks their 
> logical orders. When a message arrives in a wrong sequence, a log is written 
> in Karaf, and the message is copied into another backup queue. We also have 
> the logs of the Camel routes that produce the messages that gives the origin 
> chronology.
> Here an example where 5 messages constitute the "conversation". The table 
> below (I put it at the end of the text) is built from the message mirror 
> table. Its columns are :
>  - the "id" (primary key) generated by ActiveMQ : this gives a first idea 
> about message ordering.
>  - the "ie_id" is our business identifier for a given batch of messages.
>  - the "correlation_id" has its name says, a ID that may correlate different 
> conversations, and is shared all over the application system.
>  - the "etape", is the business step. I'll give more details about that later.
>  - "date_traitement" is the current date that is set by the producer just 
> before the message is sent.
>  - "jms_timestamp_ms" and "jms_timestamp" are the JMSTimestamp of the message.
>  - the "commit_time" is the pgsql transaction time of the inserted row.
> The logical order of our steps in almost all cases of conversation is :
>  - 1 "initier_echange"
>  - 1 "initier_utr" for each sub-batch (Logical transactional unit – TU)
>  - 0 or more "diffuser_donnees_utr" for each TU (one for every generated 
> data).
>  - 1 "diffuser_utr" that closes the sub-batch.
> Our most common use cases manage only one UT with 1 or 2 data.
> According to the table, we can see that the "diffuser_utr" has been processed 
> by ActiveMQ before the last "diffuser_donnees_utr". This last message seems 
> to have been temporarly lost for a while : it's been persisted, and then 
> delivered a very long time after it's been sent. See that the 
> "date_traitement" is right, but both the "id" of the row and its commit time 
> do not match. The JMS timestamps also seem inconsistent, first comparing with 
> the "date_traitement", and then with a same value for the 4 last messages, 
> even if the last one has a quite delay before it's been inserted and 
> dispatched. Finally, the last message has been received by the consumer at 
> 2025-01-23T15:45:39,236, thus the thrown error. 
> This is just an example. In some other cases the very first message is 
> delivered after all the others.
> On the broker side, no log about any strange behavior. The system and JVM 
> monitoring do not show any resource lack issue.
> FYI, our load tests are done with a JMeter injector that set a 20 threads 
> group, each one sending messages to 16 queues with loops from 10 to 100. The 
> 16 queues are processed by 16 Camel routes that indirectly produce 4 or 5 
> messages into the "buggy queue" for every received input. In the application, 
> a single application with a single consumer processes them. The same in our 
> testing Camel consumer.
> As the broker is some kind of black box for us, we can't give more 
> information about how it manages this kind of load test that is more or less 
> what could happen in a real rush situation.
> Of course I can provide us with any detail you could ask for. I've not found 
> any interesting case googling for similar issues, but perhaps someone already 
> had some.
> {{Thanks in advance.}}
> {noformat}
> |id    |ie_id                               |correlation_id                   
>    |etape               |date_traitement         
> |jms_timestamp_ms|jms_timestamp                |commit_time                  |
> |-----|----------------------------------|----------------------------------|------------------|----------------------|--------------|---------------------------|----------------------------|
> |8 124 
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_echange
>      |2025-01-23T14:45:15.963Z|1737643516108   |2025-01-23 15:45:16.108 
> +0100|2025-01-23 15:45:15.964 +0100|
> |8 142 
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_utr
>          |2025-01-23T14:45:16.003Z|1737643515969   |2025-01-23 15:45:15.969 
> +0100|2025-01-23 15:45:16.005 +0100|
> |8 165 
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
>    |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:16.064 +0100|
> |8 166 
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_utr
>         |2025-01-23T14:45:16.067Z|1737643515969   |2025-01-23 15:45:15.969 
> +0100|2025-01-23 15:45:16.069 +0100|
> |18 
> 982|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
>    |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:39.265 +0100|{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to