szetszwo commented on code in PR #1166:
URL: https://github.com/apache/ratis/pull/1166#discussion_r1797169395


##########
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java:
##########
@@ -1921,4 +1923,5 @@ void onGroupLeaderElected() {
   boolean isRunning() {
     return startComplete.get() && lifeCycle.getCurrentState() == State.RUNNING;
   }
+

Review Comment:
   Please revert this whitespace change.



##########
ratis-server/src/test/java/org/apache/ratis/server/impl/MiniRaftCluster.java:
##########
@@ -346,8 +346,8 @@ public void start() throws IOException {
     LOG.info(".............................................................. 
");
 
     initServers();
-    startServers(servers.values());
 
+    startServers(servers.values());

Review Comment:
   Please revert this whitespace change.



##########
ratis-server/src/test/java/org/apache/ratis/server/impl/LeaderElectionTests.java:
##########
@@ -97,6 +100,70 @@ public void testBasicLeaderElection() throws Exception {
     cluster.shutdown();
   }
 
+  static class SleepCode implements CodeInjectionForTesting.Code {
+    private final long sleepMs;
+
+    SleepCode(long sleepMs) {
+      this.sleepMs = sleepMs;
+    }
+
+    @Override
+    public boolean execute(Object localId, Object remoteId, Object... args) {
+      try {
+        LOG.info("{}: Simulate RaftServer startup blocking", localId);
+        Thread.sleep(sleepMs);
+      } catch (InterruptedException e) {
+        throw new RuntimeException(e);
+      }
+      return true;
+    }
+  }
+
+  @Test
+  public void testWaitServerReady() throws Exception {
+    LOG.info("Running testWaitServerReady");
+    int []arr = {1, 2, 3, 4, 5};
+    for (int i : arr) {
+      final MiniRaftCluster cluster = newCluster(1);
+      CodeInjectionForTesting.put(RaftServerImpl.START_COMPLETE, new 
SleepCode(1000));
+      cluster.start();
+      // Leader will be elected if the server is ready
+      Assertions.assertTrue(RaftTestUtil.waitForLeader(cluster).getId() != 
null);
+      cluster.shutdown();
+    }
+  }

Review Comment:
   For testing the bug, I suggest the following change to check the timestamps:
   ```java
     @Test
     public void testWaitServerReady() throws Exception {
       final int sleepMs = 1000 + ThreadLocalRandom.current().nextInt(1000);
       LOG.info("Running testWaitServerReady, sleep = {}ms", sleepMs);
       CodeInjectionForTesting.put(RaftServerImpl.START_COMPLETE, new 
SleepCode(sleepMs));
       final MiniRaftCluster cluster = newCluster(1);
       final Timestamp startTime = Timestamp.currentTime();
       cluster.start();
       LOG.info("Cluster started at {}ms", startTime.elapsedTimeMs());
       final RaftGroupId groupId = cluster.getGroupId();
       final RaftServerImpl server = (RaftServerImpl) 
cluster.getServers().iterator().next().getDivision(groupId);
       final boolean isRunning = server.isRunning();
       LOG.info("{} isRunning at {}ms? {}", server.getId(), 
startTime.elapsedTimeMs(), isRunning);
   
       // Leader will be elected if the server is ready
       Assertions.assertNotNull(waitForLeader(cluster), "No leader is 
elected.");
       final long elapsedMs = startTime.elapsedTimeMs();
       // allow a small difference to tolerate system timer inaccuracy
       Assertions.assertTrue(elapsedMs > sleepMs - 10, () -> "elapseMs = " + 
elapsedMs + " but sleepMs = " + sleepMs);
       cluster.shutdown();
       CodeInjectionForTesting.remove(RaftServerImpl.START_COMPLETE);
     }
   ```
   
   With the new log message, we will see a lot of "first-election.timeout" logs 
and then "Successfully started".  It seems good enough.
   ```
   2024-10-11 09:12:52,420 [s0-impl-thread1] INFO  util.CodeInjectionForTesting 
(LeaderElectionTests.java:execute(112)) - s0: Simulate RaftServer startup 
blocking, sleep 1686ms
   2024-10-11 09:12:52,420 [s0-impl-thread1] INFO  util.CodeInjectionForTesting 
(LeaderElectionTests.java:execute(112)) - s0: Simulate RaftServer startup 
blocking, sleep 1686ms
   2024-10-11 09:12:52,571 [s0@group-C119E77DBEA6-FollowerState] INFO  
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) - 
raft.server.rpc.first-election.timeout.min = 150ms (fallback to 
raft.server.rpc.timeout.min)
   2024-10-11 09:12:52,571 [s0@group-C119E77DBEA6-FollowerState] INFO  
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) - 
raft.server.rpc.first-election.timeout.max = 300ms (fallback to 
raft.server.rpc.timeout.max)
   2024-10-11 09:12:52,758 [s0@group-C119E77DBEA6-FollowerState] INFO  
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) - 
raft.server.rpc.first-election.timeout.min = 150ms (fallback to 
raft.server.rpc.timeout.min)
   2024-10-11 09:12:52,758 [s0@group-C119E77DBEA6-FollowerState] INFO  
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) - 
raft.server.rpc.first-election.timeout.max = 300ms (fallback to 
raft.server.rpc.timeout.max)
   ...
   2024-10-11 09:12:53,925 [s0@group-C119E77DBEA6-FollowerState] INFO  
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) - 
raft.server.rpc.first-election.timeout.min = 150ms (fallback to 
raft.server.rpc.timeout.min)
   2024-10-11 09:12:53,925 [s0@group-C119E77DBEA6-FollowerState] INFO  
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) - 
raft.server.rpc.first-election.timeout.max = 300ms (fallback to 
raft.server.rpc.timeout.max)
   2024-10-11 09:12:54,107 [s0-impl-thread1] INFO  server.RaftServer$Division 
(RaftServerImpl.java:start(407)) - s0@group-C119E77DBEA6: Successfully started.
   ```



##########
ratis-server/src/test/java/org/apache/ratis/server/impl/LeaderElectionTests.java:
##########
@@ -97,6 +100,70 @@ public void testBasicLeaderElection() throws Exception {
     cluster.shutdown();
   }
 
+  static class SleepCode implements CodeInjectionForTesting.Code {
+    private final long sleepMs;
+
+    SleepCode(long sleepMs) {
+      this.sleepMs = sleepMs;
+    }
+
+    @Override
+    public boolean execute(Object localId, Object remoteId, Object... args) {
+      try {
+        LOG.info("{}: Simulate RaftServer startup blocking", localId);
+        Thread.sleep(sleepMs);
+      } catch (InterruptedException e) {
+        throw new RuntimeException(e);
+      }
+      return true;
+    }
+  }
+
+  @Test
+  public void testWaitServerReady() throws Exception {
+    LOG.info("Running testWaitServerReady");
+    int []arr = {1, 2, 3, 4, 5};
+    for (int i : arr) {
+      final MiniRaftCluster cluster = newCluster(1);
+      CodeInjectionForTesting.put(RaftServerImpl.START_COMPLETE, new 
SleepCode(1000));
+      cluster.start();
+      // Leader will be elected if the server is ready
+      Assertions.assertTrue(RaftTestUtil.waitForLeader(cluster).getId() != 
null);
+      cluster.shutdown();
+    }
+  }
+
+  @Test
+  public void testAddServerForWaitReady() throws IOException, 
InterruptedException {

Review Comment:
   This test seems testing similar thing as the first test.  Let's remove it?



##########
ratis-server/src/test/java/org/apache/ratis/server/impl/LeaderElectionTests.java:
##########
@@ -97,6 +100,70 @@ public void testBasicLeaderElection() throws Exception {
     cluster.shutdown();
   }
 
+  static class SleepCode implements CodeInjectionForTesting.Code {
+    private final long sleepMs;
+
+    SleepCode(long sleepMs) {
+      this.sleepMs = sleepMs;
+    }
+
+    @Override
+    public boolean execute(Object localId, Object remoteId, Object... args) {
+      try {
+        LOG.info("{}: Simulate RaftServer startup blocking", localId);
+        Thread.sleep(sleepMs);
+      } catch (InterruptedException e) {
+        throw new RuntimeException(e);
+      }
+      return true;
+    }
+  }
+
+  @Test
+  public void testWaitServerReady() throws Exception {
+    LOG.info("Running testWaitServerReady");
+    int []arr = {1, 2, 3, 4, 5};
+    for (int i : arr) {
+      final MiniRaftCluster cluster = newCluster(1);
+      CodeInjectionForTesting.put(RaftServerImpl.START_COMPLETE, new 
SleepCode(1000));

Review Comment:
   Let add a `CodeInjectionForTesting.remove` method.  Otherwise, it will 
affect other tests.
   
   ```diff
   +++ 
b/ratis-common/src/main/java/org/apache/ratis/util/CodeInjectionForTesting.java
   @@ -55,6 +55,12 @@ public final class CodeInjectionForTesting {
        INJECTION_POINTS.put(injectionPoint, code);
      }
    
   +  /** Remove an injection point. */
   +  public static void remove(String injectionPoint) {
   +    final Code removed = INJECTION_POINTS.remove(injectionPoint);
   +    LOG.debug("remove({}): {}", injectionPoint, removed);
   +  }
   +
      /** Execute the injected code, if there is any. */
      public static boolean execute(String injectionPoint, Object localId,
   ```



##########
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java:
##########
@@ -401,6 +402,7 @@ boolean start() throws IOException {
 
     jmxAdapter.registerMBean();
     state.start();
+    CodeInjectionForTesting.execute(START_COMPLETE, getId(), null, role);
     startComplete.compareAndSet(false, true);

Review Comment:
   Let's add a log message:
   ```java
       if (startComplete.compareAndSet(false, true)) {
         LOG.info("{}: Successfully started.", getMemberId());
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to