[
https://issues.apache.org/jira/browse/AMQ-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ephemeris Lappis updated AMQ-9653:
----------------------------------
Description:
After some load tests we can see random cases of messages that are delivered in
an unexpected order. I our case this breaks the application for which the
processing order is mandatory.
FYI, we use a JDBC (PostgreSQL) persistance, although it may not be directly
concerned.
We've set some hacks to try to analyze where and when this occurs. First of
them is a trigger on the "activemq_msgs" table to insert all the messages in a
mirror table to keep them. Then we've substituted the application's queue
consumer by a small Camel route that consumes messages and checks their logical
orders. When a message arrives in a wrong sequence, a log is written in Karaf,
and the message is copied into another backup queue. We also have the logs of
the Camel routes that produce the messages that gives the origin chronology.
Here an example where 5 messages constitute the "conversation". The table below
(I put it at the end of the text) is built from the message mirror table. Its
columns are :
- the "id" (primary key) generated by ActiveMQ : this gives a first idea about
message ordering.
- the "ie_id" is our business identifier for a given batch of messages.
- the "correlation_id" has its name says, a ID that may correlate different
conversations, and is shared all over the application system.
- the "etape", is the business step. I'll give more details about that later.
- "date_traitement" is the current date that is set by the producer just
before the message is sent.
- "jms_timestamp_ms" and "jms_timestamp" are the JMSTimestamp of the message.
- the "commit_time" is the pgsql transaction time of the inserted row.
The logical order of our steps in almost all cases of conversation is :
- 1 "initier_echange"
- 1 "initier_utr" for each sub-batch (Logical transactional unit – TU)
- 0 or more "diffuser_donnees_utr" for each TU (one for every generated data).
- 1 "diffuser_utr" that closes the sub-batch.
Our most common use cases manage only one UT with 1 or 2 data.
According to the table, we can see that the "diffuser_utr" has been processed
by ActiveMQ before the last "diffuser_donnees_utr". This last message seems to
have been temporarly lost for a while : it's been persisted, and then delivered
a very long time after it's been sent. See that the "date_traitement" is right,
but both the "id" of the row and its commit time do not match. The JMS
timestamps also seem inconsistent, first comparing with the "date_traitement",
and then with a same value for the 4 last messages, even if the last one has a
quite delay before it's been inserted and dispatched. Finally, the last message
has been received by the consumer at 2025-01-23T15:45:39,236, thus the thrown
error.
This is just an example. In some other cases the very first message is
delivered after all the others.
On the broker side, no log about any strange behavior. The system and JVM
monitoring do not show any resource lack issue.
FYI, our load tests are done with a JMeter injector that set a 20 threads
group, each one sending messages to 16 queues with loops from 10 to 100. The 16
queues are processed by 16 Camel routes that indirectly produce 4 or 5 messages
into the "buggy queue" for every received input. In the application, a single
application with a single consumer processes them. The same in our testing
Camel consumer.
As the broker is some kind of black box for us, we can't give more information
about how it manages this kind of load test that is more or less what could
happen in a real rush situation.
Of course I can provide us with any detail you could ask for. I've not found
any interesting case googling for similar issues, but perhaps someone already
had some.
{{Thanks in advance.}}
{noformat}
|id |ie_id |correlation_id
|etape |date_traitement |jms_timestamp_ms|jms_timestamp
|commit_time |
|-----|----------------------------------|----------------------------------|------------------|----------------------|--------------|---------------------------|----------------------------|
|8 124
|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_echange
|2025-01-23T14:45:15.963Z|1737643516108 |2025-01-23 15:45:16.108
+0100|2025-01-23 15:45:15.964 +0100|
|8 142
|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_utr
|2025-01-23T14:45:16.003Z|1737643515969 |2025-01-23 15:45:15.969
+0100|2025-01-23 15:45:16.005 +0100|
|8 165
|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
|2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:16.064 +0100|
|8 166
|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_utr
|2025-01-23T14:45:16.067Z|1737643515969 |2025-01-23 15:45:15.969
+0100|2025-01-23 15:45:16.069 +0100|
|18
982|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
|2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:39.265 +0100|{noformat}
was:
After some load tests we can see random cases of messages that are delivered in
an unexpected order. I our case this breaks the application for which the
processing order is mandatory.
FYI, we use a JDBC (PostgreSQL) persistance, although it may not be directly
concerned.
We've set some hacks to try to analyze where and when this occurs. First of
them is a trigger on the "activemq_msgs" table to insert all the messages in a
mirror table to keep them. Then we've substituted the application's queue
consumer by a small Camel route that consumes messages and checks their logical
orders. When a message arrives in a wrong sequence, a log is written in Karaf,
and the message is copied into another backup queue. We also have the logs of
the Camel routes that produce the messages that gives the origin chronology.
Here an example where 5 messages constitute the "conversation". The table below
(I put it at the end of the text) is built from the message mirror table. Its
columns are :
- the "id" (primary key) generated by ActiveMQ : this gives a first idea about
message ordering.
- the "ie_id" is our business identifier for a given batch of messages.
- the "correlation_id" has its name says, a ID that may correlate different
conversations, and is shared all over the application system.
- the "etape", is the business step. I'll give more details about that later.
- "date_traitement" is the current date that is set by the producer just before
the message is sent.
- "jms_timestamp_ms" and "jms_timestamp" are the JMSTimestamp of the message.
- the "commit_time" is the pgsql transaction time of the inserted row.
The logical order of our steps in almost all cases of conversation is :
- 1 "initier_echange"
- 1 "initier_utr" for each sub-batch (Logical transactional unit -- TU)
- 0 or more "diffuser_donnees_utr" for each TU (one for every generated data).
- 1 "diffuser_utr" that closes the sub-batch.
Our most common use cases manage only one UT with 1 or 2 data.
According to the table, we can see that the "diffuser_utr" has been processed
by ActiveMQ before the last "diffuser_donnees_utr". This last message seems to
have been temporarly lost for a while : it's been persisted, and then delivered
a very long time after it's been sent. See that the "date_traitement" is right,
but both the "id" of the row and its commit time do not match. The JMS
timestamps also seem inconsistent, first comparing with the "date_traitement",
and then with a same value for the 4 last messages, even if the last one has a
quite delay before it's been inserted and dispatched. Finally, the last message
has been received by the consumer at 2025-01-23T15:45:39,236, thus the thrown
error.
This is just an example. In some other cases the very first message is
delivered after all the others.
On the broker side, no log about any strange behavior. The system and JVM
monitoring do not show any resource lack issue.
FYI, our load tests are done with a JMeter injector that set a 20 threads
group, each one sending messages to 16 queues with loops from 10 to 100. The 16
queues are processed by 16 Camel routes that inderectly produce 4 or 5 messages
into the "buggy queue" for every received input. In the application, a single
application with a single consumer processes them. The same in our testing
Camel consumer.
As the broker is some kind of black box for us, we can't give more information
about how it manages this kind of load test that is more or less what could
happen in a real rush situation.
Of course I can provide us with any detail you could ask for. I've not found
any intersting case googling for similar issues, but perhaps someone already
had some.
{{Thanks in advance.}}
{{{color:#0747a6}|id |ie_id |correlation_id
|etape |date_traitement
|jms_timestamp_ms|jms_timestamp |commit_time
|{color}}}
{{{color:#0747a6}|------|------------------------------------|------------------------------------|--------------------|------------------------|----------------|-----------------------------|-----------------------------|{color}}}
{{{color:#0747a6}|8 124
|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_echange
|2025-01-23T14:45:15.963Z|1737643516108 |2025-01-23 15:45:16.108
+0100|2025-01-23 15:45:15.964 +0100|{color}}}
{{{color:#0747a6}|8 142
|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_utr
|2025-01-23T14:45:16.003Z|1737643515969 |2025-01-23 15:45:15.969
+0100|2025-01-23 15:45:16.005 +0100|{color}}}
{{{color:#0747a6}|8 165
|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
|2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:16.064 +0100|{color}}}
{{{color:#0747a6}|8 166
|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_utr
|2025-01-23T14:45:16.067Z|1737643515969 |2025-01-23 15:45:15.969
+0100|2025-01-23 15:45:16.069 +0100|{color}}}
{{{color:#0747a6}|18
982|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
|2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:39.265 +0100|{color}}}
> ActiveMQ doesn't process and deliver messages into/from queues in right order
> -----------------------------------------------------------------------------
>
> Key: AMQ-9653
> URL: https://issues.apache.org/jira/browse/AMQ-9653
> Project: ActiveMQ Classic
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.18.3
> Environment: ActiveMQ is executed as a pod in Kubernetes clusters,
> with a OpenJDK 64-Bit Server VM version 17.0.9+9 (Eclipse Adoptium).
> The persistance DB is on a VM (not a container)
> The Karaf has the same configuration in the same cluster namespace.
> All in Linux images 5.15.0-116-generic #126-Ubuntu
> Reporter: Ephemeris Lappis
> Priority: Major
>
> After some load tests we can see random cases of messages that are delivered
> in an unexpected order. I our case this breaks the application for which the
> processing order is mandatory.
> FYI, we use a JDBC (PostgreSQL) persistance, although it may not be directly
> concerned.
> We've set some hacks to try to analyze where and when this occurs. First of
> them is a trigger on the "activemq_msgs" table to insert all the messages in
> a mirror table to keep them. Then we've substituted the application's queue
> consumer by a small Camel route that consumes messages and checks their
> logical orders. When a message arrives in a wrong sequence, a log is written
> in Karaf, and the message is copied into another backup queue. We also have
> the logs of the Camel routes that produce the messages that gives the origin
> chronology.
> Here an example where 5 messages constitute the "conversation". The table
> below (I put it at the end of the text) is built from the message mirror
> table. Its columns are :
> - the "id" (primary key) generated by ActiveMQ : this gives a first idea
> about message ordering.
> - the "ie_id" is our business identifier for a given batch of messages.
> - the "correlation_id" has its name says, a ID that may correlate different
> conversations, and is shared all over the application system.
> - the "etape", is the business step. I'll give more details about that later.
> - "date_traitement" is the current date that is set by the producer just
> before the message is sent.
> - "jms_timestamp_ms" and "jms_timestamp" are the JMSTimestamp of the message.
> - the "commit_time" is the pgsql transaction time of the inserted row.
> The logical order of our steps in almost all cases of conversation is :
> - 1 "initier_echange"
> - 1 "initier_utr" for each sub-batch (Logical transactional unit – TU)
> - 0 or more "diffuser_donnees_utr" for each TU (one for every generated
> data).
> - 1 "diffuser_utr" that closes the sub-batch.
> Our most common use cases manage only one UT with 1 or 2 data.
> According to the table, we can see that the "diffuser_utr" has been processed
> by ActiveMQ before the last "diffuser_donnees_utr". This last message seems
> to have been temporarly lost for a while : it's been persisted, and then
> delivered a very long time after it's been sent. See that the
> "date_traitement" is right, but both the "id" of the row and its commit time
> do not match. The JMS timestamps also seem inconsistent, first comparing with
> the "date_traitement", and then with a same value for the 4 last messages,
> even if the last one has a quite delay before it's been inserted and
> dispatched. Finally, the last message has been received by the consumer at
> 2025-01-23T15:45:39,236, thus the thrown error.
> This is just an example. In some other cases the very first message is
> delivered after all the others.
> On the broker side, no log about any strange behavior. The system and JVM
> monitoring do not show any resource lack issue.
> FYI, our load tests are done with a JMeter injector that set a 20 threads
> group, each one sending messages to 16 queues with loops from 10 to 100. The
> 16 queues are processed by 16 Camel routes that indirectly produce 4 or 5
> messages into the "buggy queue" for every received input. In the application,
> a single application with a single consumer processes them. The same in our
> testing Camel consumer.
> As the broker is some kind of black box for us, we can't give more
> information about how it manages this kind of load test that is more or less
> what could happen in a real rush situation.
> Of course I can provide us with any detail you could ask for. I've not found
> any interesting case googling for similar issues, but perhaps someone already
> had some.
> {{Thanks in advance.}}
> {noformat}
> |id |ie_id |correlation_id
> |etape |date_traitement
> |jms_timestamp_ms|jms_timestamp |commit_time |
> |-----|----------------------------------|----------------------------------|------------------|----------------------|--------------|---------------------------|----------------------------|
> |8 124
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_echange
> |2025-01-23T14:45:15.963Z|1737643516108 |2025-01-23 15:45:16.108
> +0100|2025-01-23 15:45:15.964 +0100|
> |8 142
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|initier_utr
> |2025-01-23T14:45:16.003Z|1737643515969 |2025-01-23 15:45:15.969
> +0100|2025-01-23 15:45:16.005 +0100|
> |8 165
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
> |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:16.064 +0100|
> |8 166
> |5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_utr
> |2025-01-23T14:45:16.067Z|1737643515969 |2025-01-23 15:45:15.969
> +0100|2025-01-23 15:45:16.069 +0100|
> |18
> 982|5009d4f3-dfac-4de2-8360-32999ee6aac7|6b371cdc-7071-4dfa-8c58-231770df4f8e|diffuser_donnees_utr|2025-01-23T14:45:16.062Z|1737643515969
> |2025-01-23 15:45:15.969 +0100|2025-01-23 15:45:39.265 +0100|{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact