Hi all, After spending a lot of time understanding this part (quite tricky), I've spent a lot of time with various attempts. But I was always breaking something else. I finally came to a simpler fix that works better without breaking other things.
I created a PR for it: https://github.com/apache/activemq/pull/1616 I ran it many times in a loop locally. I'm not sure it will pass the build because of all sorts of flaky tests. Feel free to review the PR and provide feedback. -- Jean-Louis Monteiro http://twitter.com/jlouismonteiro http://www.tomitribe.com On Thu, Dec 18, 2025 at 4:58 PM Arthur Naseef <[email protected]> wrote: > Not sure if I understood the entire flow here correctly, so take this with > a grain of salt. > > In a transacted session on a client using failover, there are going to > cases where the broker and client get out-of-sync, and so the broker > sending the same messages to the client (aka duplicates) sounds reasonable. > > Does the broker remember all of the messages delivered in the transaction, > and then accept them all as consumed on the commit packet from the client? > Or, is there an explicit ACK packet sent by the client for each message? > I'm wondering how we prevent a race condition on the broker delivering a > message to the client, and the client sending back a commit, both at the > same time. It's been a while since I looked at this code... > > Also in the case identified, the problem is that messages end up in the > DLQ, right? Does everything else work correctly (i.e. the consumer > received a copy of each message and processed each)? If so, then perhaps > we need a way to recognize that race condition and deal with it correctly > within the broker? > > Art > > > > On Wed, Dec 17, 2025 at 8:55 AM Jean-Louis Monteiro < > [email protected]> wrote: > > > Thanks for confirming. That was my understanding as well, that's why I > did > > not fix the test itself. > > > > I need to finish the connection leak in the PooledConnectionFactory and > > finish the Github Actions work with Jean Baptiste. > > I'll create an issue in JIRA for now and start thinking about fixing the > > ActiveMQMessageConsumer. > > > > When I have a PR up and running, I'll let you know because it will > require > > more eyes on it. > > > > -- > > Jean-Louis Monteiro > > http://twitter.com/jlouismonteiro > > http://www.tomitribe.com > > > > > > On Wed, Dec 17, 2025 at 4:24 PM Christopher Shannon < > > [email protected]> wrote: > > > > > HI Jean-Louis, > > > > > > I agree that this sounds like a bug if the scenario is playing out as > you > > > describe. If the consumer is receiving duplicate messages that were not > > > committed before then they should of course not be considered a > > duplicate. > > > The broker re-sending is what it is supposed to do as the connection > was > > > broken and the previous transaction was not committed by the consumer. > > > > > > As you pointed out, the previouslyDeliveredMessages is where this > should > > be > > > tracked, that gets populated a bit later though if I recall (it's been > a > > > while since i've been in that code). The messages first go into the > > > deliveredMessages linked list before being moved, so without actually > > > debugging it my assumption is maybe the messages make it to > > > deliveredMessages but don't make it all the way to > > > previouslyDeliveredMessages before the connection is broken, so there > may > > > need to be client work to fix that if that is what is happening. > > > > > > Chris > > > > > > On Wed, Dec 17, 2025 at 5:48 AM Jean-Louis Monteiro < > > > [email protected]> wrote: > > > > > > > Hi all, > > > > > > > > FailoverDurableSubTransactionTest sometimes fails, especially in > Github > > > > Actions or my own environment. In Apache Jenkins, it looks like it > does > > > not > > > > happen which suggests probably a race condition problem. > > > > > > > > FailoverDurableSubTransactionTest.testFailoverCommitListener:352 DLQ > > > empty > > > > expected:<0> but was:<10> > > > > > > > > The test fails because 10 messages unexpectedly end up in the Dead > > Letter > > > > Queue (DLQ). > > > > I looked at the test and I have an hypothesis ... > > > > 1/ the test intentionally breaks the connection during commit (see > > > plugin) > > > > 2/ before the commit breaks, the consumer has already received 10 > > > messages > > > > as part of the transaction, but transaction is not yet committed at > > this > > > > moment > > > > 3/ the transport reconnects, the broker correctly redelivers the 10 > > > > uncommitted messages > > > > 4/ client thinks they are duplicates and sends an ack to the broker > to > > > > indicate the messages are problematic and should go to DLQ > > > > > > > > To me, it's a client side issue. The messages previously received as > > part > > > > of the transaction that is never committed should probably not be > > treated > > > > as duplicated when the broker redelivers them after reconnecting. > > > > > > > > I tried looking at the following message in the logs when the test > > fails > > > > > > > > 2025-12-17 01:22:46,438 [ Session Task-1] - WARN > > ActiveMQMessageConsumer > > > - > > > > > ID:runnervm6qbrg-43909-1765934560760-30:2:1:1 suppressing duplicate > > > > > delivery on connection, poison acking: MessageDispatch {commandId = > > 0, > > > > > responseRequired = false, consumerId = > > > > > ID:runnervm6qbrg-43909-1765934560760-30:2:1:1, destination = > > > > > topic://Failover.WithTx, message = ActiveMQTextMessage {commandId = > > 15, > > > > > responseRequired = true, messageId = > > > > > ID:runnervm6qbrg-43909-1765934560760-32:1:1:1:12, > > originalDestination = > > > > > null, originalTransactionId = null, producerId = > > > > > ID:runnervm6qbrg-43909-1765934560760-32:1:1:1, destination = > > > > > topic://Failover.WithTx, transactionId = null, deliveryTime = 0, > > > > expiration > > > > > = 0, timestamp = 1765934566409, arrival = 0, brokerInTime = > > > > 1765934566409, > > > > > brokerOutTime = 1765934566433, correlationId = null, replyTo = > null, > > > > > persistent = true, type = null, priority = 4, groupID = null, > > > > groupSequence > > > > > = 0, targetConsumerId = null, compressed = false, userID = null, > > > content > > > > = > > > > > org.apache.activemq.util.ByteSequence@39112b96, > > marshalledProperties = > > > > > org.apache.activemq.util.ByteSequence@2c85f629, dataStructure = > > null, > > > > > redeliveryCounter = 1, size = 0, properties = {ID=11}, > > > > readOnlyProperties = > > > > > true, readOnlyBody = true, droppable = false, > > > jmsXGroupFirstForConsumer = > > > > > false, text = Test message}, redeliveryCounter = 1} > > > > > > > > > > > > > > I'm wondering if previouslyDeliveredMessages (in > > ActiveMQMessageConsumer) > > > > is not correctly populated in all failover scenarios, especially > > > > asynchronous dispatch. > > > > > > > > Fixing the test or reworking it would probably hide an underlying > bug. > > > > -- > > > > Jean-Louis Monteiro > > > > http://twitter.com/jlouismonteiro > > > > http://www.tomitribe.com > > > > > > > > > >
