[
https://issues.apache.org/jira/browse/RATIS-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852796#comment-16852796
]
Tsz Wo Nicholas Sze commented on RATIS-545:
-------------------------------------------
Thanks [~ljain] for the patch.
The comparison of requested sleep time and actually sleep time is a good idea
to check if a machine having some gc or other problems. If a problem is
detected, we should print a warning.
JavaUtils.checkPossibleJVMPause is a comparison of the time values (nothing to
do with JVM pause). Really appreciate the effort of adding
testJVMPauseDetection(). Unfortunately, it is testing the comparison but not
JVM pause.
How about we change checkPossibleJVMPause to below?
{code}
static boolean sleep(long sleepMs, long thresholdMs) throws
InterruptedException {
final Timestamp t = Timestamp.currentTime();
Thread.sleep(sleepMs);
final long elapsedMs = t.elapsedTimeMs();
if (elapsedMs - sleepMs > thresholdMs) {
LOG.warn("Unexpected long sleep: sleep({}ms) actually took {}ms which is
over the threshold {}ms",
sleepMs, elapsedMs, thresholdMs);
return false;
}
return true;
}
{code}
Then, we may use it in FollowerState as below.
{code}
@@ -91,7 +92,10 @@ class FollowerState extends Daemon {
while (monitorRunning && server.isFollower()) {
final long electionTimeout = server.getRandomTimeoutMs();
try {
- Thread.sleep(electionTimeout);
+ if (!JavaUtils.sleep(electionTimeout, thresholdMs)) {
+ continue;
+ }
+
if (!monitorRunning || !server.isFollower()) {
{code}
electionTimeout is a random value. It seems wrong to use it as the threshold.
I think thresholdMs should be a configurable value.
> Leader Election timeout should consider JVM pause interval
> ----------------------------------------------------------
>
> Key: RATIS-545
> URL: https://issues.apache.org/jira/browse/RATIS-545
> Project: Ratis
> Issue Type: Bug
> Reporter: Lokesh Jain
> Assignee: Lokesh Jain
> Priority: Major
> Attachments: RATIS-545.001.patch, RATIS-545.002.patch,
> RATIS-545.003.patch, RATIS-545.004.patch
>
>
> It is possible for a follower to turn itself to a candidate after a JVM
> pause. The timeout in follower should consider the JVM pause interval before
> triggering leader re-election.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)