[ 
https://issues.apache.org/jira/browse/ARTEMIS-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059401#comment-18059401
 ] 

Claudiu Chioasca commented on ARTEMIS-5895:
-------------------------------------------

Thanks for your feedback.

The issue is not around duplicate detection, which is totally fine and managed 
in the enterprise app I m working on.

The problem is the commit() call doesn't fail at all on client side, the broker 
doesn't seem to signal the issue, as if it was successful.

I replicated this with both the JDK client (as reported), but also we noticed 
the same issue with CMS (I know.. but we're still using it).

If you check the log I attached, for message 2373 (the one missing) there's no 
exception raised, but it's correctly raised for the next 2374, which is managed 
with retry on the client side, deduplication normally happens with amq dedup 
id. 

Thanks again.

 

> Message loss during failover switch in shared store configuration
> -----------------------------------------------------------------
>
>                 Key: ARTEMIS-5895
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5895
>             Project: Artemis
>          Issue Type: Bug
>          Components: OpenWire
>    Affects Versions: 2.44.0
>            Reporter: Claudiu Chioasca
>            Priority: Critical
>         Attachments: 2372.png, 2373_missing.png, 2374.png, 
> FailoverApplicationTests.java, QueueSender.java, artemis.log, 
> failover-queue.png, failover-test-automation.ps1, 
> producer-bug-detected-iteration-82.log
>
>
> Sometimes, a message producer connected via OPENWIRE protocol and calling 
> commit() over a transacted session is not signaled with an exception when 
> failover switch happens and the commit fails. 
>  
> My test consists of: 
>  
>  - primary/backup artemis instances deployed with shared store configuration 
> (2.44.0)
>  
>  - a JDK21 spring boot (4.0.1) based producer:
>  
> <dependency>
> <groupId>org.springframework.boot</groupId>
> <artifactId>spring-boot-starter-activemq</artifactId>
> </dependency>
>  
> that connects to the broker via failover url: 
> failover:(ssl://LOCAL-DEV:5176,ssl://LOCAL-DEV:4176)
>  
>  - this scenario: while both primary & backup are up, producer starts sending 
> 10000 messages to "failover-queue" destination, during this time the primary 
> instance is shut down using "artemis stop". The producer is configured to 
> retry when session.commit() fails
>  
>  - a script to repeat the same sequence of steps until message loss is 
> detected: restart brokers, purge test destination, execute spring boot test, 
> shut down primary when messages start to appear in test destination, count 
> the messages when the test finishes
>  
> I let the script running for a couple of hours until it replicated, 
> producer-bug-detected-iteration-82.log shows the output of the producer + 
> script detecting the loss.
> I attached the primary instance log at the time it was stopping and message 
> #2373 was lost. 
> The 2373_missing.png is a capture of Artemis console for the failover-queue 
> destination, where it can be noticed 2372 & 2374 are consecutive.
> The producer log shows the 2374 first send is rolled-back, then retried as 
> expected, but 2373 send appears successful.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to