[ 
https://issues.apache.org/jira/browse/IGNITE-23661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-23661:
-------------------------------------
    Description: 
The test may fail with
{code:java}
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
  at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)  at 
app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)  at 
app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)  at 
app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)  at 
app//org.apache.ignite.internal.distributionzones.ItIgniteDistributionZoneManagerNodeRestartTest.testFirstLogicalTopologyUpdateInterruptedEventRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:562)
  at [email protected]/java.lang.reflect.Method.invoke(Method.java:566)  at 
[email protected]/java.util.ArrayList.forEach(ArrayList.java:1541)  at 
[email protected]/java.util.ArrayList.forEach(ArrayList.java:1541) {code}
Reproduced locally 1/100

 

Long story short, the stack trace of a problem is following:

1. 
{code:java}
assertTrue(waitForCondition(() -> 
newLogicalTopology.equals(finalDistributionZoneManager.logicalTopology()){code}
fails because 
{code:java}
finalDistributionZoneManager.logicalTopology(){code}
is not updated because 
 
2.
notificationFuture in WatchProcessor#notifyWatches() is not completed because 
notifyUpdateRevisionFuture is not completed because 
 
3. 
{code:java}
assert causalityToken > lastCompleteToken{code}
in 
org.apache.ignite.internal.causality.IncrementalVersionedValue#completeInternal 
isn't matched (but not logged though) because
 
4.
watch events are reordered by revision despite the fact that they are properly 
ordered by timestamp, see 
org.apache.ignite.internal.metastorage.server.NotifyWatchProcessorEvent#compareTo
 for more details.
 
5.
All in all, order mismatch is a result of non-thread safe 
StandaloneMetaStorageManager. Both onBeforeApply and command processing within 
listener should be thread-safe. That's why there's a race between operation 
safeTime assignment and revision calculation. onBeforeApply is guarded by group 
specific monitor, precisely
{code:java}
synchronized (groupIdSyncMonitor(request.groupId())){code}
See ActionRequestProcessor.handleRequestInternal for more details. Command 
processing on its turn is expected to be processed under raft umbrella, meaning 
in single-thread environment.
 And corresponding synchronisation was missed in StandaloneMetaStorageManager.
 

  was:
The test may fail with
{code:java}
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
  at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)  at 
app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)  at 
app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)  at 
app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)  at 
app//org.apache.ignite.internal.distributionzones.ItIgniteDistributionZoneManagerNodeRestartTest.testFirstLogicalTopologyUpdateInterruptedEventRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:562)
  at [email protected]/java.lang.reflect.Method.invoke(Method.java:566)  at 
[email protected]/java.util.ArrayList.forEach(ArrayList.java:1541)  at 
[email protected]/java.util.ArrayList.forEach(ArrayList.java:1541) {code}
Reproduced locally 1/100


> ItIgniteDistributionZoneManagerNodeRestartTest.testFirstLogicalTopologyUpdateInterruptedEventRestoredAfterRestart
>  is flaky
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-23661
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23661
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test may fail with
> {code:java}
> org.opentest4j.AssertionFailedError: expected: <true> but was: <false>  at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)  
> at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)  at 
> app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)  at 
> app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)  at 
> app//org.apache.ignite.internal.distributionzones.ItIgniteDistributionZoneManagerNodeRestartTest.testFirstLogicalTopologyUpdateInterruptedEventRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:562)
>   at [email protected]/java.lang.reflect.Method.invoke(Method.java:566)  at 
> [email protected]/java.util.ArrayList.forEach(ArrayList.java:1541)  at 
> [email protected]/java.util.ArrayList.forEach(ArrayList.java:1541) {code}
> Reproduced locally 1/100
>  
> Long story short, the stack trace of a problem is following:
> 1. 
> {code:java}
> assertTrue(waitForCondition(() -> 
> newLogicalTopology.equals(finalDistributionZoneManager.logicalTopology()){code}
> fails because 
> {code:java}
> finalDistributionZoneManager.logicalTopology(){code}
> is not updated because 
>  
> 2.
> notificationFuture in WatchProcessor#notifyWatches() is not completed because 
> notifyUpdateRevisionFuture is not completed because 
>  
> 3. 
> {code:java}
> assert causalityToken > lastCompleteToken{code}
> in 
> org.apache.ignite.internal.causality.IncrementalVersionedValue#completeInternal
>  isn't matched (but not logged though) because
>  
> 4.
> watch events are reordered by revision despite the fact that they are 
> properly ordered by timestamp, see 
> org.apache.ignite.internal.metastorage.server.NotifyWatchProcessorEvent#compareTo
>  for more details.
>  
> 5.
> All in all, order mismatch is a result of non-thread safe 
> StandaloneMetaStorageManager. Both onBeforeApply and command processing 
> within listener should be thread-safe. That's why there's a race between 
> operation safeTime assignment and revision calculation. onBeforeApply is 
> guarded by group specific monitor, precisely
> {code:java}
> synchronized (groupIdSyncMonitor(request.groupId())){code}
> See ActionRequestProcessor.handleRequestInternal for more details. Command 
> processing on its turn is expected to be processed under raft umbrella, 
> meaning in single-thread environment.
>  And corresponding synchronisation was missed in StandaloneMetaStorageManager.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to