Claudiu Chioasca created ARTEMIS-5895:
-----------------------------------------

             Summary: Message loss during failover switch in shared store 
configuration
                 Key: ARTEMIS-5895
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5895
             Project: Artemis
          Issue Type: Bug
          Components: OpenWire
    Affects Versions: 2.44.0
            Reporter: Claudiu Chioasca


Message loss during failover switch in shared store configuration
 
Sometimes, a message producer connected via OPENWIRE protocol and calling 
commit() over a transacted session is not signaled with an exception when 
failover switch happens and the commit fails. 
 
My test consists of: 
 
- primary/backup artemis instances deployed with shared store configuration 
(2.44.0)
 
- a JDK21 spring boot (4.0.1) based producer:
 
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-activemq</artifactId>
</dependency>
 
that connects to the broker via failover url: 
failover:(ssl://LOCAL-DEV:5176,ssl://LOCAL-DEV:4176)
 
- this scenario: while both primary & backup are up, producer starts sending 
10000 messages to "failover-queue" destination, during this time the primary 
instance is shut down using "artemis stop". The producer is configured to retry 
when session.commit() fails
 
- a script to repeat the same sequence of steps until message loss is detected: 
restart brokers, purge test destination, execute spring boot test, shut down 
primary when messages start to appear in test destination, count the messages 
when the test finishes
 
I let the script running for a couple of hours until it replicated:
 
========== FAILOVER TEST RESULTS ==========
STAT:TOTAL_ATTEMPTED=10000
STAT:SUCCESS_IN_LOOP=10000
STAT:ERROR_IN_LOOP=0
STAT:ATTEMPTED_COUNT=10001
STAT:COMMITTED_COUNT=10000
STAT:FAILED_COUNT=1
STAT:POTENTIALLY_LOST=0
============================================
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 42.48 s 
-- in com.mycompany.failover.FailoverApplicationTests
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  44.941 s
[INFO] Finished at: 2026-02-06T15:36:17+02:00
[INFO] ------------------------------------------------------------------------
[2026-02-06 15:36:22] [INFO] Querying message count for queue 'failover-queue' 
on broker...
Connection brokerURL = tcp://localhost:4175
|NAME          |ADDRESS       
|CONSUMER|MESSAGE|MESSAGES|DELIVERING|MESSAGES|SCHEDULED|ROUTING|INTERNAL|
|              |              | COUNT  | COUNT | ADDED  |  COUNT   | ACKED  |  
COUNT  | TYPE  |        |
|failover-queue|failover-queue|   0    | 9999  |  9999  |    0     |   0    |   
 0    |ANYCAST| false  |
[2026-02-06 15:36:25] [INFO] Queue 'failover-queue' has 9999 messages
[2026-02-06 15:36:25] [INFO] ========== ITERATION 82 RESULTS ==========
[2026-02-06 15:36:25] [INFO] Expected messages: 10000
[2026-02-06 15:36:25] [INFO] Client reported sent: 10000
[2026-02-06 15:36:25] [INFO] Actual messages in queue: 9999
[2026-02-06 15:36:25] [INFO] Kill delay was: 2299 ms (after messages started)
[2026-02-06 15:36:25] [INFO] Test failed: True
[2026-02-06 15:36:25] [ERROR] !!! BUG DETECTED !!! 1 messages lost (client sent 
10000 but queue has 9999)
[2026-02-06 15:36:25] [INFO] Bug details saved to: 
C:\workspace\bugs\artemis\failover\bug-detected-iteration-82.log
[2026-02-06 15:36:25] [INFO] Stopping Backup broker (PID: 13076)...
[2026-02-06 15:36:25] [INFO] Backup broker stopped.
[2026-02-06 15:36:25] [INFO] Cleaning up any existing Artemis processes...
[2026-02-06 15:36:28] [ERROR] ========================================
[2026-02-06 15:36:28] [ERROR] !!! BUG REPLICATED AT ITERATION 82 !!!
[2026-02-06 15:36:28] [ERROR] Kill delay was: 2299 ms
[2026-02-06 15:36:28] [ERROR] Client sent: 10000 messages
[2026-02-06 15:36:28] [ERROR] Queue has: 9999 messages
[2026-02-06 15:36:28] [ERROR] Messages LOST: 1
[2026-02-06 15:36:28] [ERROR] ========================================



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to