[ https://issues.apache.org/jira/browse/IGNITE-25539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955826#comment-17955826 ]
Sergey Chugunov edited comment on IGNITE-25539 at 6/3/25 7:46 AM: ------------------------------------------------------------------ The same fix is applicable to two other tests: *{{testPendingMessagesOverflow}}* and *{{testCustomMessageInSingletonCluster}}* Test failures were caused by a race in test code: {code:java} startGrid("listener"); // line 1 sentEnsuredMsgs.clear(); // line 2 receivedEnsuredMsgs.clear(); // line 3{code} When new node started at line 1 joins an existing cluster, coordinator generates and sends across the ring *{{CacheAffinityChangeMessage}}* message. This message is added to {{*sentEnsuredMsgs* }}collection on coordinator (happens almost immediately) and to *{{receivedEnsuredMsgs}}* collection on listener node (with some delay as it has to move accross the whole ring). BUT - both collections are cleared in runner thread without any pause (lines 2 and 3), which creates a race condition: if *{{CacheAffinityChangeMessage}}* message is delayed a bit more, it will be added to *{{receivedEnsuredMsgs}}* collection AFTER the collection is cleared and subsequently fail test assertion. was (Author: sergeychugunov): The same fix is applicable to two other tests: *{{testPendingMessagesOverflow}}* and *{{testCustomMessageInSingletonCluster}}* > TcpDiscoveryPendingMessageDeliveryTest is flaky on TC > ----------------------------------------------------- > > Key: IGNITE-25539 > URL: https://issues.apache.org/jira/browse/IGNITE-25539 > Project: Ignite > Issue Type: Bug > Reporter: Sergey Chugunov > Assignee: Sergey Chugunov > Priority: Major > Fix For: 2.18 > > Time Spent: 40m > Remaining Estimate: 0h > > Test testDeliveryAllFailedMessagesInCorrectOrder is > [flaky|https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-4230688419866011807&tab=testDetails] > on TC with high failure rate. > Failures are reproducible locally with much lower fail rate. > It seems from logs that discovery ring collapses not in the way the test > expects it, some investigation is needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)