[ 
https://issues.apache.org/jira/browse/IGNITE-25539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955826#comment-17955826
 ] 

Sergey Chugunov edited comment on IGNITE-25539 at 6/3/25 7:46 AM:
------------------------------------------------------------------

The same fix is applicable to two other tests: 
*{{testPendingMessagesOverflow}}* and *{{testCustomMessageInSingletonCluster}}*

Test failures were caused by a race in test code:
{code:java}
startGrid("listener"); // line 1
sentEnsuredMsgs.clear(); // line 2
receivedEnsuredMsgs.clear(); // line 3{code}
When new node started at line 1 joins an existing cluster, coordinator 
generates and sends across the ring *{{CacheAffinityChangeMessage}}* message.

This message is added to {{*sentEnsuredMsgs* }}collection on coordinator 
(happens almost immediately) and to *{{receivedEnsuredMsgs}}* collection on 
listener node (with some delay as it has to move accross the whole ring).

BUT - both collections are cleared in runner thread without any pause (lines 2 
and 3), which creates a race condition: if *{{CacheAffinityChangeMessage}}* 
message is delayed a bit more, it will be added to *{{receivedEnsuredMsgs}}* 
collection AFTER the collection is cleared and subsequently fail test assertion.


was (Author: sergeychugunov):
The same fix is applicable to two other tests: 
*{{testPendingMessagesOverflow}}* and *{{testCustomMessageInSingletonCluster}}*

 

> TcpDiscoveryPendingMessageDeliveryTest is flaky on TC
> -----------------------------------------------------
>
>                 Key: IGNITE-25539
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25539
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Major
>             Fix For: 2.18
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Test testDeliveryAllFailedMessagesInCorrectOrder is 
> [flaky|https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-4230688419866011807&tab=testDetails]
>  on TC with high failure rate.
> Failures are reproducible locally with much lower fail rate.
> It seems from logs that discovery ring collapses not in the way the test 
> expects it, some investigation is needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to