[jira] [Comment Edited] (ARTEMIS-4794) CoreBridge: Duplicate message when bridge is stopped while messages being consumed by target node

Justin Bertram (Jira) Thu, 06 Jun 2024 10:43:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852892#comment-17852892
 ]


Justin Bertram edited comment on ARTEMIS-4794 at 6/6/24 5:42 PM:
-----------------------------------------------------------------

As discussed on Slack...

It's not necessarily a surprise that the same message might be on the source 
and the target at the same time given that the bridge uses asynchronous send 
acknowledgements. In fact, there's a window of time where this is true of every 
message sent by the bridge (i.e. the time between when the target receives the 
message and the source receives the asynchronous acknowledgement and removes 
the message it sent). This, of course, is a very short window of time, but it 
can be longer if, for example, the network connection fails for some reason.

However, this is where duplicate detection proves valuable. If something 
happens during the time when the target receives the message and the source 
receives the asynchronous acknowledgement then when everything is back to 
normal and the bridge begins to resend messages then any messages sent from the 
source that are already on the target will simply be ignore due to the 
duplicate detection. This provides "eventual consistency."

All that said, the bridge doesn't wait for pending acknowledgements when it is 
stopped and I believe that is triggering this behavior when it could be avoided 
by simply waiting for the pending acks. For what it's worth, _pausing_ the 
bridge actually does wait for pending acks so instead of stopping and 
restarting the bridge you can simply pause and resume it.


was (Author: jbertram):
As discussed on Slack...

It's not necessarily a surprise that the same message might be on the source 
and the target at the same time given that the bridge uses asynchronous send 
acknowledgements. In fact, there's a window of time where this is true of every 
message sent by the bridge (i.e. the time between when the target receives the 
message and the source receives the asynchronous acknowledgement and removes 
the message it sent). This, of course, is a very short window of time, but it 
can be longer if, for example, the network connection fails for some reason.

However, this is where duplicate detection proves valuable. If something 
happens during the time when the target receives the message and the source 
receives the asynchronous acknowledgement then when everything is back to 
normal and the bridge begins to resend messages any messages sent from the 
source that are already on the target will simply be ignore due to the 
duplicate detection.

All that said, the bridge doesn't wait for pending acknowledgements when it is 
stopped and I believe that is triggering this behavior when it could be avoided 
by simply waiting for the pending acks. For what it's worth, _pausing_ the 
bridge actually does wait for pending acks so instead of stopping and 
restarting the bridge you can simply pause and resume it.

> CoreBridge: Duplicate message when bridge is stopped while messages being 
> consumed by target node
> -------------------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-4794
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4794
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 2.30.0, 2.34.0
>            Reporter: nmeylan
>            Priority: Major
>         Attachments: BridgeDuplicateMessagesARTEMIS4794Test.java
>
>
> +Attached test *BridgeDuplicateMessagesARTEMIS4794Test.java*+ highlights the 
> issue with _org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl_
> Place it under 
> _tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/cluster/bridge_
> {*}Summary{*}:
>      When a bridge is stopped while messages being consumed by the target 
> node, it can lead to duplicate messages.
> {*}Description{*}:
>     When Using bridge and programmatically *stopping* it while messages are 
> being consumed by the target node, the source node fails to get the 
> acknowledgement from target node and messages now exists on the source and 
> the target node.
> It appears that the "active" flag being set to false when 
> BridgeImpl.StopRunnable is called prevent message to be acknowledged by 
> _BridgeImpl::sendAcknowledged_ function
>  
> {*}Context{*}:
> This bug appear in my code (a custom plugin) because is start and stop Bridge 
> programmatically to move messages from one node to another when some 
> conditions are met, if they are no longer met I want to stop the moving of 
> messages.
>  
> *Notes:*
>  * Changing bridge configuration 
> {_}useDuplicateDetection{_},{_}confirmationWindowSize{_} or 
> _producerWindowSize_ parameter do not help to mitigate the issue
>  * Not related to large messages, i use large messages in my test to ease 
> reproduction 
>  * Reproduced on 2.30 and 2.34
>  * Calling pause() does not create duplicate 
> {_}server.getClusterManager().getBridges().get(bridgeName).pause(){_};
>  
> *Resolution:*
> Maybe _StopRunnable::run_ should wait until _queue.getDeliveringCount() == 0_ 
> after removing consumer but before going further in the stop process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact

[jira] [Comment Edited] (ARTEMIS-4794) CoreBridge: Duplicate message when bridge is stopped while messages being consumed by target node

Reply via email to