[ 
https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-19238:
-------------------------------------
    Fix Version/s: 3.0.0-beta2

> ItDataTypesTest is flaky
> ------------------------
>
>                 Key: IGNITE-19238
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19238
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>         Attachments: Снимок экрана от 2023-04-06 10-39-32.png
>
>
> 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests 
> failed to stop replicas on node stop:
> !Снимок экрана от 2023-04-06 10-39-32.png!
>  
> {code:java}
> java.lang.AssertionError: There are replicas alive 
> [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, 
> b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]]
>     at 
> org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341)
>     at 
> org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133)
>     at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
>     at 
> org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131)
>     at 
> org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code}
> 2. The reason why we failed to stop replicas is the race between 
> tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv.
> On TableManager stop, we stop and cleanup all table resources like replicas 
> and raft nodes
> {code:java}
> public void stop() {
>   ...
>   Map<UUID, TableImpl> tables = tablesByIdVv.latest();  // 1*
>   cleanUpTablesResources(tables); 
>   cleanUpTablesResources(tablesToStopInCaseOfError);
>   ...
> }{code}
> where tablesToStopInCaseOfError is a sort of pending tables list which one is 
> cleared on cfg storage revision update.
> tablesByIdVv *listens same storage revision update event* in order to publish 
> tables related to the given revision or in other words make such tables 
> accessible from tablesByIdVv.latest(); that one that is used in order to 
> retrieve tables for cleanup on components stop (see // 1* above)
> {code:java}
> public TableManager(
>   ... 
>   tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new);
>   registry.accept(token -> {
>     tablesToStopInCaseOfError.clear();
>     
>     return completedFuture(null);
>   });
>   {code}
> However inside IncrementalVersionedValue we have async storageRevision update 
> processing
>  
> {code:java}
> updaterFuture = updaterFuture.whenComplete((v, t) -> 
> versionedValue.complete(causalityToken, localUpdaterFuture)); {code}
> As a result it's possible that we will clear tablesToStopInCaseOfError before 
> publishing same revision tables to tablesByIdVv, so that we will miss that 
> cleared tables in tablesByIdVv.latest() which is used in TableManager#stop.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to