Core-Raft guarantees safeness by using logical clock(term-index) instead of physical clock. However, optimizations like leader-bypass-read or leader-lease-read rely on physical clock (leader election timeout). We already have a leaderStepDownWaitTime in JVMPauseMonitor to prevent this situation. Still, leaderStepDownWaitTime cannot guarantee 100% linearizability.
Under the situation where the JVM pauses are frequent and durable, it’s better to use Core-Raft read(through Raft Log) if you still want 100% consistency. Best, William > 2023年10月24日 17:28,Xinyu Tan <[email protected]> 写道: > > Hi, Tsz-Wo > >> BTW, the other timeout mechanisms specified in the Raft algorithm may > also not be suitable for a virtual machine environment. > > I suddenly realized that for the "lease read," it uses nanotime to > determine the duration of the lease. During a virtual machine pause, this > value in the JVM is likely not to increase. So, it's possible that after > the old leader's virtual machine is restored, it may still serve read > requests, leading to the occurrence of a split-brain phenomenon. In this > regard, perhaps setting it to an infinite value is not a good idea~ > > However, I strongly support the idea of introducing a separate parameter to > distinguish it from the judgment of the "slowFollower." Maybe I can create > an issue and submit a pull request? > > Thanks > ------------------------ > Xinyu Tan > > Tsz Wo Sze <[email protected]> 于2023年10月21日周六 00:22写道: > >> Hi Xinyu, >> >> The JvmPauseMonitor is to monitor the local machine and try to detect if it >> is non-responsive. As you know, it will shut down the server when the >> extra sleep is larger than a threshold. The design is to detect and >> prevent a running faulty machine since it may slow down the entire cluster. >> >> I agree that the design is not suitable for a virtual machine environment. >> (BTW, the other timeout mechanisms specified in the Raft algorithm may >> also not be suitable for a virtual machine environment.) As a workaround, >> it is a good idea to set rpcSlownessTimeout to a large value for disabling >> the auto-shutdown. Instead of using rpcSlownessTimeout, how about we use a >> separate conf for the threshold? Then, it won't affect the slow follower >> detection feature. >> >> Tsz-Wo >> >> >> On Thu, Oct 19, 2023 at 7:48 PM Xinyu Tan <[email protected]> wrote: >> >>> Hello, Ratis community >>> >>> I would like to understand the rationale behind a specific design detail >> of >>> JvmPauseMonitor. In the current code base, when JvmPauseMonitor observes >> a >>> JVM pause lasting over 60 seconds, it closes the RaftServerProxy in the >>> handleJvmPause. >>> >>> In our production system, some users may stop the virtual machine running >>> the process for several minutes. When they resume the virtual machine, >> they >>> find that the RaftServerProxy's state is already Closed, and they must >>> restart it to restore the correct state. This has caused operational >>> challenges for us. I would like to know the specific reasons for this >>> design. What problem is it meant to prevent? If there's no particular >>> reason, we will consider adjusting the rpcSlownessTimeout to infinity in >>> IoTDB to disable this feature. >>> >>> Thanks ------------------------ Xinyu Tan >>> >>
