ibessonov commented on code in PR #1604:
URL: https://github.com/apache/ignite-3/pull/1604#discussion_r1094194205


##########
modules/network/src/integrationTest/java/org/apache/ignite/network/scalecube/ItScaleCubeNetworkMessagingTest.java:
##########
@@ -320,6 +322,39 @@ public void testMessageGroupsHandlers(TestInfo testInfo) 
throws Exception {
         assertThat(networkMessageFuture, willBe(equalTo(networkMessage)));
     }
 
+    /**
+     * Tests removal from physical topology.
+     *
+     * @param testInfo Test info.
+     * @throws Exception If failed.
+     */
+    @Test
+    public void removalFromPhysicalTopologyRemovesAndFiresEvent(TestInfo 
testInfo) throws Exception {

Review Comment:
   I don't understand this test. There's a latch, but no one is waiting for it. 
The test is probably wrong or incomplete, please revisit it



##########
modules/cli/src/integrationTest/java/org/apache/ignite/internal/cli/IntegrationTestBase.java:
##########
@@ -117,7 +117,8 @@ public class IntegrationTestBase extends 
BaseIgniteAbstractTest {
             + "        gossipInterval: 10\n"
             + "      },\n"
             + "    }\n"
-            + "  }\n"
+            + "  },"
+            + "  cluster.failoverTimeout: 0\n"

Review Comment:
   Will this make tests less stable? What's the reason?



##########
modules/network/src/main/java/org/apache/ignite/network/scalecube/ScaleCubeTopologyService.java:
##########
@@ -77,41 +87,30 @@ void onMembershipEvent(MembershipEvent event) {
         ClusterNode member = fromMember(event.member(), metadata);
 
         if (event.isAdded()) {
-            members.put(member.address(), member);
-            consistentIdToMemberMap.put(member.name(), member);
+            // When a node appears in DT, it immediately appears in PT.
+            consistentIdsToDiscoveredMembers.put(member.name(), member);
+
+            physicalTopologyMembers.put(member.address(), member);
+            consistentIdsToPhysicalTopologyMembers.put(member.name(), member);
 
             LOG.info("Node joined [node={}]", member);
 
             fireAppearedEvent(member);
         } else if (event.isUpdated()) {
-            members.put(member.address(), member);
-            consistentIdToMemberMap.put(member.name(), member);
+            consistentIdsToDiscoveredMembers.put(member.name(), member);
+
+            physicalTopologyMembers.put(member.address(), member);
+            consistentIdsToPhysicalTopologyMembers.put(member.name(), member);
         } else if (event.isRemoved()) {
-            members.compute(member.address(), (addr, node) -> {
-                // Ignore stale remove event.
-                if (node == null || node.id().equals(member.id())) {
-                    return null;
-                } else {
-                    return node;
-                }
-            });
-
-            consistentIdToMemberMap.compute(member.name(), (consId, node) -> {
-                // Ignore stale remove event.
-                if (node == null || node.id().equals(member.id())) {
-                    return null;
-                } else {
-                    return node;
-                }
-            });
-
-            LOG.info("Node left [member={}]", member);
-
-            fireDisappearedEvent(member);
+            removeFromConsistentIdsToMembers(member, 
consistentIdsToDiscoveredMembers);
+
+            LOG.info("Node left Discovery Topology [member={}]", member);

Review Comment:
   I don't think that it makes sense to call a set of swim nodes a "discovery 
topology", let's discuss this. From my point of view, something like 
"fast-fail" flag in "send" method would be enough, client would just get an 
exception if there's no channel at this point in time, and retry the request 
later. No need for extra topology service



##########
modules/network/src/main/java/org/apache/ignite/network/scalecube/ScaleCubeTopologyService.java:
##########
@@ -132,19 +134,45 @@ void updateLocalMetadata(@Nullable NodeMetadata metadata) 
{
      * @param member Appeared cluster member.
      */
     private void fireAppearedEvent(ClusterNode member) {
+        // Appearing in the Discovery Topology is equivalent to appearing in 
the Physical topology, so we trigger
+        // PT.onAppeared() right away.
+
         for (TopologyEventHandler handler : getEventHandlers()) {
-            handler.onAppeared(member);
+            try {
+                handler.onAppeared(member);
+            } catch (Exception | AssertionError e) {

Review Comment:
   Wow, why do we specifically catch assertion errors? Why not "Throwable" then?



##########
modules/runner/src/integrationTest/java/org/apache/ignite/internal/AbstractClusterIntegrationTest.java:
##########
@@ -71,7 +75,8 @@ public abstract class AbstractClusterIntegrationTest extends 
BaseIgniteAbstractT
             + "        gossipInterval: 10\n"
             + "      },\n"
             + "    }\n"
-            + "  }\n"
+            + "  },"
+            + "  cluster.failoverTimeout: 0\n"

Review Comment:
   Looks out of place, to be honest. Maybe it should be a part of network 
configuration, let's consider that a possibility



##########
modules/runner/src/integrationTest/java/org/apache/ignite/internal/compute/ItLogicalTopologyTest.java:
##########
@@ -170,6 +175,96 @@ public void onDisappeared(ClusterNode disappearedNode, 
LogicalTopologySnapshot n
         assertTrue(secondIgniteDisappeared.await(10, TimeUnit.SECONDS), "Did 
not see second node leaving in time");
     }
 
+    @Test
+    
@BootstrapConfigTemplateMethod("templateWithVeryLongDelayToRemoveFromLogicalTopology")

Review Comment:
   Is there a shorter name for "DelayToRemoveFromLogicalTopology"? I bet there 
is



##########
modules/runner/src/integrationTest/java/org/apache/ignite/internal/BootstrapConfigTemplateMethod.java:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal;
+
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Defines name of a static method defined in a test class that will be 
invoked to produce a bootstrap config
+ * template to use intead of a default template.
+ */
+@Retention(RetentionPolicy.RUNTIME)
+@Target({ElementType.METHOD})
+public @interface BootstrapConfigTemplateMethod {

Review Comment:
   I wonder if this annotation and the abstract test should be in test 
fixtures? Just a thought, you don't need to move them



##########
modules/network/src/main/java/org/apache/ignite/internal/network/discovery/DiscoveryTopologyEventListener.java:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.network.discovery;
+
+import org.apache.ignite.network.ClusterNode;
+
+/**
+ * Interface for handling events related to discovery topology changes. 
Discovery Topology (DT) is the set of nodes
+ * that were discovered by a discovery mechanism (like a SWIM protocol 
implementation) and which did not leave (from
+ * the point of view of the discovery mechanism) yet.
+ */
+public interface DiscoveryTopologyEventListener {

Review Comment:
   So, the terminology is all messed up. We have:
   - TopologyEventHandler
   - LogicalTopologyEventListener
   - DiscoveryTopologyEventListener
   and they all share the same set of names for events. This is confusing, and 
we should probably fix it in the future.
   Anyway, why do we need a third class? What's the difference between this 
event and "TopologyEventHandler"? Should there be a difference? Too many 
entities is bad for understanding them, let's try to make it simpler



##########
modules/cluster-management/src/main/java/org/apache/ignite/internal/cluster/management/ClusterManagementGroupManager.java:
##########
@@ -664,6 +679,26 @@ private void sendWithRetry(ClusterNode node, 
NetworkMessage msg, CompletableFutu
                 });
     }
 
+    @Override
+    public void beforeNodeStop() {
+        CompletableFuture<CmgRaftService> serviceFuture = raftService;
+
+        if (serviceFuture != null) {
+            tryToLeaveLogicalTopologyGracefully(serviceFuture);
+        }
+    }
+
+    private void 
tryToLeaveLogicalTopologyGracefully(CompletableFuture<CmgRaftService> 
serviceFuture) {
+        serviceFuture.thenCompose(service -> 
service.removeFromCluster(Set.of(clusterService.topologyService().localMember())))
+                .orTimeout(GRACEFUL_LEAVE_TIMEOUT_MILLIS, 
TimeUnit.MILLISECONDS)
+                .exceptionally(ex -> {
+                    LOG.info("Was not able to send a bye-bye in time", ex);
+
+                    return null;
+                })
+                .join();

Review Comment:
   Should we really wait for it?



##########
modules/raft/src/test/java/org/apache/ignite/raft/jraft/rpc/IgniteRpcTest.java:
##########
@@ -107,6 +110,15 @@ public IgniteRpcTest(TestInfo testInfo) {
 
         service.start();
 
+        // Remove a node from Physical Topology as soon as it gets lost by 
SWIM.
+        InternalTopologyService internalTopologyService = 
(InternalTopologyService) service.topologyService();

Review Comment:
   This is interesting, why did you have to explicitly add this to a test? Is 
there a way to have this listener "by default" when starting all necessary 
components?



##########
modules/runner/src/integrationTest/java/org/apache/ignite/internal/compute/ItLogicalTopologyTest.java:
##########
@@ -170,6 +175,96 @@ public void onDisappeared(ClusterNode disappearedNode, 
LogicalTopologySnapshot n
         assertTrue(secondIgniteDisappeared.await(10, TimeUnit.SECONDS), "Did 
not see second node leaving in time");
     }
 
+    @Test
+    
@BootstrapConfigTemplateMethod("templateWithVeryLongDelayToRemoveFromLogicalTopology")
+    void nodeDoesNotLeaveLogicalTopologyImmediatelyAfterBeingLostBySwim() 
throws Exception {
+        IgniteImpl entryNode = node(0);
+
+        startNode(1);
+
+        entryNode.logicalTopologyService().addEventListener(listener);
+
+        // Knock the node out without allowing it to say good bye.
+        cluster.knockOutNode(1, NodeKnockout.PARTITION_NETWORK);
+
+        assertFalse(waitForCondition(() -> !events.isEmpty(), 3_000));

Review Comment:
   Can we somehow make it faster than 3 seconds?
   Please add a comment why this value is chosen, it must depend on other 
settings, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to