szetszwo commented on code in PR #1166:
URL: https://github.com/apache/ratis/pull/1166#discussion_r1797169395
##########
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java:
##########
@@ -1921,4 +1923,5 @@ void onGroupLeaderElected() {
boolean isRunning() {
return startComplete.get() && lifeCycle.getCurrentState() == State.RUNNING;
}
+
Review Comment:
Please revert this whitespace change.
##########
ratis-server/src/test/java/org/apache/ratis/server/impl/MiniRaftCluster.java:
##########
@@ -346,8 +346,8 @@ public void start() throws IOException {
LOG.info("..............................................................
");
initServers();
- startServers(servers.values());
+ startServers(servers.values());
Review Comment:
Please revert this whitespace change.
##########
ratis-server/src/test/java/org/apache/ratis/server/impl/LeaderElectionTests.java:
##########
@@ -97,6 +100,70 @@ public void testBasicLeaderElection() throws Exception {
cluster.shutdown();
}
+ static class SleepCode implements CodeInjectionForTesting.Code {
+ private final long sleepMs;
+
+ SleepCode(long sleepMs) {
+ this.sleepMs = sleepMs;
+ }
+
+ @Override
+ public boolean execute(Object localId, Object remoteId, Object... args) {
+ try {
+ LOG.info("{}: Simulate RaftServer startup blocking", localId);
+ Thread.sleep(sleepMs);
+ } catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+ return true;
+ }
+ }
+
+ @Test
+ public void testWaitServerReady() throws Exception {
+ LOG.info("Running testWaitServerReady");
+ int []arr = {1, 2, 3, 4, 5};
+ for (int i : arr) {
+ final MiniRaftCluster cluster = newCluster(1);
+ CodeInjectionForTesting.put(RaftServerImpl.START_COMPLETE, new
SleepCode(1000));
+ cluster.start();
+ // Leader will be elected if the server is ready
+ Assertions.assertTrue(RaftTestUtil.waitForLeader(cluster).getId() !=
null);
+ cluster.shutdown();
+ }
+ }
Review Comment:
For testing the bug, I suggest the following change to check the timestamps:
```java
@Test
public void testWaitServerReady() throws Exception {
final int sleepMs = 1000 + ThreadLocalRandom.current().nextInt(1000);
LOG.info("Running testWaitServerReady, sleep = {}ms", sleepMs);
CodeInjectionForTesting.put(RaftServerImpl.START_COMPLETE, new
SleepCode(sleepMs));
final MiniRaftCluster cluster = newCluster(1);
final Timestamp startTime = Timestamp.currentTime();
cluster.start();
LOG.info("Cluster started at {}ms", startTime.elapsedTimeMs());
final RaftGroupId groupId = cluster.getGroupId();
final RaftServerImpl server = (RaftServerImpl)
cluster.getServers().iterator().next().getDivision(groupId);
final boolean isRunning = server.isRunning();
LOG.info("{} isRunning at {}ms? {}", server.getId(),
startTime.elapsedTimeMs(), isRunning);
// Leader will be elected if the server is ready
Assertions.assertNotNull(waitForLeader(cluster), "No leader is
elected.");
final long elapsedMs = startTime.elapsedTimeMs();
// allow a small difference to tolerate system timer inaccuracy
Assertions.assertTrue(elapsedMs > sleepMs - 10, () -> "elapseMs = " +
elapsedMs + " but sleepMs = " + sleepMs);
cluster.shutdown();
CodeInjectionForTesting.remove(RaftServerImpl.START_COMPLETE);
}
```
With the new log message, we will see a lot of "first-election.timeout" logs
and then "Successfully started". It seems good enough.
```
2024-10-11 09:12:52,420 [s0-impl-thread1] INFO util.CodeInjectionForTesting
(LeaderElectionTests.java:execute(112)) - s0: Simulate RaftServer startup
blocking, sleep 1686ms
2024-10-11 09:12:52,420 [s0-impl-thread1] INFO util.CodeInjectionForTesting
(LeaderElectionTests.java:execute(112)) - s0: Simulate RaftServer startup
blocking, sleep 1686ms
2024-10-11 09:12:52,571 [s0@group-C119E77DBEA6-FollowerState] INFO
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) -
raft.server.rpc.first-election.timeout.min = 150ms (fallback to
raft.server.rpc.timeout.min)
2024-10-11 09:12:52,571 [s0@group-C119E77DBEA6-FollowerState] INFO
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) -
raft.server.rpc.first-election.timeout.max = 300ms (fallback to
raft.server.rpc.timeout.max)
2024-10-11 09:12:52,758 [s0@group-C119E77DBEA6-FollowerState] INFO
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) -
raft.server.rpc.first-election.timeout.min = 150ms (fallback to
raft.server.rpc.timeout.min)
2024-10-11 09:12:52,758 [s0@group-C119E77DBEA6-FollowerState] INFO
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) -
raft.server.rpc.first-election.timeout.max = 300ms (fallback to
raft.server.rpc.timeout.max)
...
2024-10-11 09:12:53,925 [s0@group-C119E77DBEA6-FollowerState] INFO
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) -
raft.server.rpc.first-election.timeout.min = 150ms (fallback to
raft.server.rpc.timeout.min)
2024-10-11 09:12:53,925 [s0@group-C119E77DBEA6-FollowerState] INFO
server.RaftServerConfigKeys (ConfUtils.java:logFallback(53)) -
raft.server.rpc.first-election.timeout.max = 300ms (fallback to
raft.server.rpc.timeout.max)
2024-10-11 09:12:54,107 [s0-impl-thread1] INFO server.RaftServer$Division
(RaftServerImpl.java:start(407)) - s0@group-C119E77DBEA6: Successfully started.
```
##########
ratis-server/src/test/java/org/apache/ratis/server/impl/LeaderElectionTests.java:
##########
@@ -97,6 +100,70 @@ public void testBasicLeaderElection() throws Exception {
cluster.shutdown();
}
+ static class SleepCode implements CodeInjectionForTesting.Code {
+ private final long sleepMs;
+
+ SleepCode(long sleepMs) {
+ this.sleepMs = sleepMs;
+ }
+
+ @Override
+ public boolean execute(Object localId, Object remoteId, Object... args) {
+ try {
+ LOG.info("{}: Simulate RaftServer startup blocking", localId);
+ Thread.sleep(sleepMs);
+ } catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+ return true;
+ }
+ }
+
+ @Test
+ public void testWaitServerReady() throws Exception {
+ LOG.info("Running testWaitServerReady");
+ int []arr = {1, 2, 3, 4, 5};
+ for (int i : arr) {
+ final MiniRaftCluster cluster = newCluster(1);
+ CodeInjectionForTesting.put(RaftServerImpl.START_COMPLETE, new
SleepCode(1000));
+ cluster.start();
+ // Leader will be elected if the server is ready
+ Assertions.assertTrue(RaftTestUtil.waitForLeader(cluster).getId() !=
null);
+ cluster.shutdown();
+ }
+ }
+
+ @Test
+ public void testAddServerForWaitReady() throws IOException,
InterruptedException {
Review Comment:
This test seems testing similar thing as the first test. Let's remove it?
##########
ratis-server/src/test/java/org/apache/ratis/server/impl/LeaderElectionTests.java:
##########
@@ -97,6 +100,70 @@ public void testBasicLeaderElection() throws Exception {
cluster.shutdown();
}
+ static class SleepCode implements CodeInjectionForTesting.Code {
+ private final long sleepMs;
+
+ SleepCode(long sleepMs) {
+ this.sleepMs = sleepMs;
+ }
+
+ @Override
+ public boolean execute(Object localId, Object remoteId, Object... args) {
+ try {
+ LOG.info("{}: Simulate RaftServer startup blocking", localId);
+ Thread.sleep(sleepMs);
+ } catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+ return true;
+ }
+ }
+
+ @Test
+ public void testWaitServerReady() throws Exception {
+ LOG.info("Running testWaitServerReady");
+ int []arr = {1, 2, 3, 4, 5};
+ for (int i : arr) {
+ final MiniRaftCluster cluster = newCluster(1);
+ CodeInjectionForTesting.put(RaftServerImpl.START_COMPLETE, new
SleepCode(1000));
Review Comment:
Let add a `CodeInjectionForTesting.remove` method. Otherwise, it will
affect other tests.
```diff
+++
b/ratis-common/src/main/java/org/apache/ratis/util/CodeInjectionForTesting.java
@@ -55,6 +55,12 @@ public final class CodeInjectionForTesting {
INJECTION_POINTS.put(injectionPoint, code);
}
+ /** Remove an injection point. */
+ public static void remove(String injectionPoint) {
+ final Code removed = INJECTION_POINTS.remove(injectionPoint);
+ LOG.debug("remove({}): {}", injectionPoint, removed);
+ }
+
/** Execute the injected code, if there is any. */
public static boolean execute(String injectionPoint, Object localId,
```
##########
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java:
##########
@@ -401,6 +402,7 @@ boolean start() throws IOException {
jmxAdapter.registerMBean();
state.start();
+ CodeInjectionForTesting.execute(START_COMPLETE, getId(), null, role);
startComplete.compareAndSet(false, true);
Review Comment:
Let's add a log message:
```java
if (startComplete.compareAndSet(false, true)) {
LOG.info("{}: Successfully started.", getMemberId());
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]