(activemq-artemis) branch main updated: ARTEMIS-5606 refactor doc chapter on duplicate detection

tabish Thu, 31 Jul 2025 15:41:42 -0700

This is an automated email from the ASF dual-hosted git repository.

tabish pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/activemq-artemis.git



The following commit(s) were added to refs/heads/main by this push:
     new dca2067ddf ARTEMIS-5606 refactor doc chapter on duplicate detection
dca2067ddf is described below

commit dca2067ddfc76f4fed0970305c7ab5a51a343ab7
Author: Justin Bertram <[email protected]>
AuthorDate: Thu Jul 31 17:01:13 2025 -0500

    ARTEMIS-5606 refactor doc chapter on duplicate detection
    
    Improvements in this commit include:
    
     - Fixing a broken code example for the Core client
     - Adding sub-sections to make it easier to find relevant info quickly
     - Adding more configuration samples
     - Fixing grammatical issues to increase readability
     - Adding new sub-section on performance
---
 docs/user-manual/duplicate-detection.adoc | 145 +++++++++++++++++++++---------
 1 file changed, 102 insertions(+), 43 deletions(-)

diff --git a/docs/user-manual/duplicate-detection.adoc 
b/docs/user-manual/duplicate-detection.adoc
index 0f05ec37fa..f2f9b5c475 100644
--- a/docs/user-manual/duplicate-detection.adoc
+++ b/docs/user-manual/duplicate-detection.adoc
@@ -4,7 +4,7 @@
 :docinfo: shared
 
 Apache ActiveMQ Artemis includes powerful automatic duplicate message 
detection, filtering out duplicate messages without you having to code your own 
fiddly duplicate detection logic at the application level.
-This chapter will explain what duplicate detection is, how Apache ActiveMQ 
Artemis uses it and how and where to configure it.
+This chapter will explain what duplicate detection is, how Apache ActiveMQ 
Artemis uses it, and how to configure it.
 
 When sending messages from a client to a server, or indeed from a server to 
another server, if the target server or connection fails sometime after sending 
the message, but before the sender receives a response that the send (or 
commit) was processed successfully then the sender cannot know for sure if the 
message was sent successfully to the address.
 
@@ -12,98 +12,142 @@ If the target server or connection failed after the send 
was received and proces
 From the senders point of view it's not possible to distinguish these two 
cases.
 
 When the server recovers this leaves the client in a difficult situation.
-It knows the target server failed, but it does not know if the last message 
reached its destination ok.
-If it decides to resend the last message, then that could result in a 
duplicate message being sent to the address.
+It knows the target server failed, but it does not know if the last message 
reached its destination successfully.
+If it decides to resend the last message then that could result in a duplicate 
message being sent to the address.
 If each message was an order or a trade then this could result in the order 
being fulfilled twice or the trade being double booked.
 This is clearly not a desirable situation.
 
-Sending the message(s) in a transaction does not help out either.
+Sending the message(s) in a transaction does not help either.
 If the server or connection fails while the transaction commit is being 
processed it is also indeterminate whether the transaction was successfully 
committed or not!
 
 To solve these issues Apache ActiveMQ Artemis provides automatic duplicate 
messages detection for messages sent to addresses.
 
 == Using Duplicate Detection for Message Sending
 
-Enabling duplicate message detection for sent messages is simple: you just 
need to set a special property on the message to a unique value.
+To enable duplicate message detection for sent messages you just need to set a 
special duplicate ID property on the message to a *unique* value.
 You can create the value however you like, as long as it is unique.
-When the target server receives the message it will check if that property is 
set, if it is, then it will check in its in memory cache if it has already 
received a message with that value of the header.
-If it has received a message with the same value before then it will ignore 
the message.
 
 [NOTE]
 ====
-
-
 Using duplicate detection to move messages between nodes can give you the same 
_once and only once_ delivery guarantees as if you were using an XA transaction 
to consume messages from source and send them to the target, but with less 
overhead and much easier configuration than using XA.
 ====
 
-If you're sending messages in a transaction then you don't have to set the 
property for _every_ message you send in that transaction, you only need to set 
it once in the transaction.
-If the server detects a duplicate message for any message in the transaction, 
then it will ignore the entire transaction.
+If you're sending messages in a transaction then you don't have to set the 
property for _every_ message you send in that transaction.
+You only need to set it once in the transaction.
+If the server detects a duplicate message for any message in the transaction 
then it will ignore the entire transaction.
 
-The name of the property that you set is given by the value of 
`org.apache.activemq.artemis.api.core.Message.HDR_DUPLICATE_DETECTION_ID`, 
which is `_AMQ_DUPL_ID`
+The name of the duplicate ID property is `_AMQ_DUPL_ID`. As a convenience for 
Java-based applications using the Core client 
`org.apache.activemq.artemis.api.core.Message.HDR_DUPLICATE_DETECTION_ID` can 
be used.
 
-When using JMS the value of the property must be a `String`, and similarly a 
string type would be used in other client APIs or protocols used with the 
broker.
-Its value should be unique.
-An easy way of generating a unique id is by generating a UUID.
+When using JMS the property's value must be a `String`, and similarly a string 
type would be used in other client APIs or protocols used with the broker.
 
 Here's an example of setting the property using the JMS API:
 
 [,java]
 ----
-...
-
 Message jmsMessage = session.createMessage();
 
 String myUniqueID = "This is my unique id";   // Could use a UUID for this
 
 message.setStringProperty(HDR_DUPLICATE_DETECTION_ID.toString(), myUniqueID);
-
-...
 ----
 
-If using the Artemis Core client the value of the property can be of type 
`byte[]` or `SimpleString`.
+If using the Artemis Core client the value of the property can be of type 
`String`, `SimpleString`, or `byte[]`.
 
-Here's an example of setting the property using using the Core API:
+Here's an example of setting the property using the Core API:
 
 [,java]
 ----
-...
-
 ClientMessage message = session.createMessage(true);
 
-SimpleString myUniqueID = "This is my unique id";   // Could use a UUID for 
this
+SimpleString myUniqueID = SimpleString.of("This is my unique id");   // Could 
use a UUID for this
 
-message.setStringProperty(HDR_DUPLICATE_DETECTION_ID, myUniqueID);
+message.putStringProperty(HDR_DUPLICATE_DETECTION_ID, myUniqueID);
 ----
 
-== Configuring the Duplicate ID Cache
+== Duplicate Detection Semantics
 
-The server maintains caches of received values of the 
`org.apache.activemq.artemis.core.message.impl.HDR_DUPLICATE_DETECTION_ID` 
property sent to each address.
-Each address has its own distinct cache.
+The server maintains a *circular*, fixed-size, per-address cache of duplicate 
IDs from messages it receives.
 
-The cache is a circular fixed size cache.
-If the cache has a maximum size of `n` elements, then the ``n + 1``th id 
stored will overwrite the ``0``th element in the cache.
+When the server receives the message it will check if the duplicate ID 
property is set.
+If it is then it will check to see if its cache for the correspond address 
already contains that duplicate ID.
+If the cache already contains that duplicate ID then the message will not be 
routed to any queues, and the server will log a `WARN` message, e.g.:
+[,console]
+----
+WARN  [org.apache.activemq.artemis.core.server] AMQ222059: Duplicate message 
detected - message will not be routed. Message information:
+CoreMessage[messageID=15, durable=false, userID=null, priority=4, 
timestamp=Thu Jan 01 00:00:00 UTC 1970, expiration=0, durable=false, 
address=myAddress, size=166, properties=TypedProperties[_AMQ_DUPL_ID=[6100 6200 
6300 6400 6500 6600 6700]]]@1034478028
+----
+If the cache does not contain that duplicate ID then it is added to the cache 
and the message is routed to any applicable queues.
 
-The maximum size of the cache is configured by the parameter `id-cache-size` 
in `broker.xml`, the default value is `20000` elements.
+Since the cache is circular then if it has a maximum size of `n` elements the 
``n + 1``th id stored will overwrite the ``0``th element in the cache.
+Duplicate IDs are _only_ removed from the cache when they are overwritten or 
cleared administratively (e.g. using the `clearDuplicateIdCache` operation on 
the corresponding xref:management.adoc#address-management[`AddressControl`] 
from the web console).
+Even if a message is acknowledged or expires its duplicate ID is not removed 
from the cache because another message with that same duplicate ID may still be 
sent.
 
-To implement an address-specific `id-cache-size`, you can add to the
-corresponding address-settings section in `broker.xml`. Specify the
-desired `id-cache-size` value for the particular address. When a message
-is sent to an address with a specific `id-cache-size` configured, it
-will take precedence over the global `id-cache-size` value, allowing
-for greater flexibility and optimization of duplicate ID caching.
+== Configuring the Duplicate ID Cache
 
-The caches can also be configured to persist to disk or not.
-This is configured by the parameter `persist-id-cache`, also in `broker.xml`.
-If this is set to `true` then each id will be persisted to permanent storage 
as they are received.
-The default value for this parameter is `true`.
+The size of the duplicate ID cache can be configured globally for all 
addresses or on a per-address basis.
+
+Whether the cache is persisted to storage is also configurable.
 
 [NOTE]
 ====
-
-
 When choosing a size of the duplicate id cache be sure to set it to a larger 
enough size so if you resend messages all the previously sent ones are in the 
cache not having been overwritten.
 ====
 
+=== Global Configuration
+
+The maximum size of the cache is configured by the parameter `id-cache-size` 
in `broker.xml`, e.g.:
+
+[,xml]
+----
+<core>
+   ...
+   <id-cache-size>5000</id-cache-size>
+   ...
+</core>
+----
+
+The default value for the global `id-cache-size` is `20000`. A value of `0` 
disables caching.
+
+=== Address-Specific Configuration
+
+To configure the cache size on a per-address basis use the `id-cache-size` 
`address-settings` section in `broker.xml`, e.g.:
+
+[,xml]
+----
+<address-setting match="myAddress">
+   ...
+   <id-cache-size>1000</id-cache-size>
+   ...
+</address-setting>
+----
+
+When a message is sent to an address with a specific `id-cache-size` 
configured it will take precedence over the global `id-cache-size` value.
+This allows for greater flexibility and optimization of duplicate ID caches.
+
+The default value for the per-address `id-cache-size` is `20000`. A value of 
`0` disables caching.
+
+=== Persisting the Cache to Storage
+
+Duplicate ID caches are persisted to storage by default.
+The benefit to persisting the cache to storage is that if the broker is 
stopped for any reason then when it restarts the data will be read from storage 
back into the cache so duplicate messages can still be detected even if they 
were sent before the broker restarted.
+However, there is a cost in terms of performance since it takes longer to 
persist the data.
+
+Duplicate ID cache persistence is configured by the parameter 
`persist-id-cache` in `broker.xml`, e.g.:
+
+[,xml]
+----
+<core>
+   ...
+   <persist-id-cache>false</id-cache-size>
+   ...
+</core>
+----
+If `persist-id-cache` is set to `true` then each ID will be persisted to 
storage as it is received.
+This is configured globally.
+It can't be configured on a per-address basis.
+
+The default value for `persist-id-cache` is `true`.
+
 == Duplicate Detection and Bridges
 
 Core bridges can be configured to automatically add a unique duplicate id 
value (if there isn't already one in the message) before forwarding the message 
to its target.
@@ -125,3 +169,18 @@ To configure a cluster connection to add the duplicate id 
header, simply set the
 The default value for this parameter is `true`.
 
 For more information on cluster connections and how to configure them, please 
see xref:clusters.adoc#clusters[Clusters].
+
+== Performance Considerations
+
+If you *do not need* duplicate detection at all or only for certain addresses 
it is best to set the global `id-cache-size` to `0` to prevent the server from 
pre-allocating internal cache-related objects, e.g.:
+
+[,xml]
+----
+<core>
+   ...
+   <id-cache-size>0</id-cache-size>
+   ...
+</core>
+----
+
+This will prevent needless consumption of heap memory so it is available to 
the broker for other uses.
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact

(activemq-artemis) branch main updated: ARTEMIS-5606 refactor doc chapter on duplicate detection

Reply via email to