rpuch commented on code in PR #6408:
URL: https://github.com/apache/ignite-3/pull/6408#discussion_r2287247201


##########
modules/cluster-management/src/main/java/org/apache/ignite/internal/cluster/management/ClusterManagementGroupManager.java:
##########
@@ -979,10 +1024,10 @@ private CompletableFuture<Void> 
updateLearnersSerially(CmgRaftService service, l
                             return service.updateLearners(term);
                         }
                     })
-                    .whenComplete((unused, throwable) -> {
-                        if (throwable != null) {
-                            LOG.error("Failed to reset learners", throwable);
-                        }
+                    .exceptionally(e -> {
+                        LOG.warn("Failed to update learners for term {}", e, 
term);

Review Comment:
   Are we ok with proceeding after a failed update? Should we fail the node?
   
   @sanpwc I would also want to hear your opinion



##########
modules/cluster-management/src/integrationTest/java/org/apache/ignite/internal/cluster/management/ItClusterManagerTest.java:
##########
@@ -317,6 +337,15 @@ void testNoConfigurationReordering() throws Exception {
         // Wait for the initial cluster reconfiguration to complete.
         assertLearnerSize(2);
 
+        // Same as above, but check the all 5 nodes see the same number of 
learners (which is 2 actually).

Review Comment:
   ```suggestion
           // Same as above, but check that all 5 nodes see the same number of 
learners (which is 2 actually).
   ```



##########
modules/cluster-management/src/integrationTest/java/org/apache/ignite/internal/cluster/management/ItClusterManagerTest.java:
##########
@@ -317,6 +337,15 @@ void testNoConfigurationReordering() throws Exception {
         // Wait for the initial cluster reconfiguration to complete.
         assertLearnerSize(2);
 
+        // Same as above, but check the all 5 nodes see the same number of 
learners (which is 2 actually).
+        assertTrue(waitForCondition(() ->
+                        configs.size() == 5 && configs.values().stream()
+                                .map(list -> list.get(list.size() - 1))
+                                .mapToInt(raftGroupConfiguration -> 
raftGroupConfiguration.learners().size())
+                                .min().orElseThrow() == 2,

Review Comment:
   It's not the most obvious way to check that every node sees exactly 2 
learners. Would it make sense to switch to `allMatch()`?



##########
modules/cluster-management/src/integrationTest/java/org/apache/ignite/internal/cluster/management/ItClusterManagerTest.java:
##########
@@ -340,60 +369,91 @@ void testNoConfigurationReordering() throws Exception {
             return false;
         });
 
-        logger().info("Stop last node [4].");
-        MockNode last = cluster.remove(cluster.size() - 1);
-        stopNodes(List.of(last));
+        logger().info("Stop the node [4].");
+        MockNode node4 = cluster.remove(cluster.size() - 1);
+        stopNodes(List.of(node4));
 
-        logger().info("Stop last node [3].");
-        MockNode last2 = cluster.remove(cluster.size() - 1);
-        stopNodes(List.of(last2));
+        logger().info("Stop the node [3].");
+        MockNode node3 = cluster.remove(cluster.size() - 1);
+        stopNodes(List.of(node3));
 
         // There should be still two learner nodes since the previous 
reconfiguration was blocked.
         assertLearnerSize(2);
 
+        // The configs will be properly updated when new 3 and 4 are started, 
so remove the history for the stopped nodes.
+        configs.remove(3);
+        configs.remove(4);
+
         // Start nodes 3 and 4 back, so that the topology is back to normal 
and no node availability issues are expected.
         logger().info("Start nodes [3] and [4].");
-        // Start node 4 first to avoid clashing with the earlier blocked 
message..
-        startNode(4, 5);
-        startNode(3, 5);
-
-        logger().info("Nodes started.");
+        // Start node 4 first to avoid clashing with the earlier blocked 
message (as we get ResetLearnersRequest(3)
+        // both when we move from [3, 4] to [3] and when we move from [] to 
[3]).
+        startNode(4, 5, config ->
+                configs.computeIfAbsent(4, k -> new 
CopyOnWriteArrayList<>()).add(config)
+        );
+        startNode(3, 5, config ->
+                configs.computeIfAbsent(3, k -> new 
CopyOnWriteArrayList<>()).add(config)
+        );
 
-        // Waif for the nodes 3 and 4 to start.
+        // Wait for the nodes 3 and 4 to start.
         for (MockNode node : cluster) {
             assertThat(node.clusterManager().joinFuture(), 
willCompleteSuccessfully());
         }
 
+        logger().info("Nodes started.");
         assertLearnerSize(2);
 
-        for (MockNode node : cluster) {
-            Boolean leader = node.clusterManager().isCmgLeader().get();
-            if (leader) {
-                logger().info("lerner nodes {}", 
node.clusterManager().learnerNodes().get());
-            }
-        }
-
         // Unblock the first reconfiguration.
         logger().info("Unblock message.");
         blockMessage.set(false);
 
-        assertLearnerSize(2);
+        // Now we need to wait for the reconfiguration to complete.
+        // To check it we will look through the raft configuration history and 
verify that all nodes have same transition history
+        // with regards to the learner nodes: [] -> [3] -> [3, 4] -> [4] -> 
[3, 4].
+        // Basically should be enough to check learners size only.
+        int[] count = {0, 1, 2, 1, 2};

Review Comment:
   ```suggestion
           int[] counts = {0, 1, 2, 1, 2};
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to