Core-Raft guarantees safeness by using logical clock(term-index) instead of 
physical clock. However, optimizations like leader-bypass-read or 
leader-lease-read rely on physical clock (leader election timeout). We already 
have a leaderStepDownWaitTime in JVMPauseMonitor to prevent this situation.  
Still, leaderStepDownWaitTime cannot guarantee 100% linearizability.

Under the situation where the JVM pauses are frequent and durable, it’s better 
to use Core-Raft read(through Raft Log) if you still want 100% consistency.

Best,
William

> 2023年10月24日 17:28,Xinyu Tan <[email protected]> 写道:
> 
> Hi, Tsz-Wo
> 
>> BTW, the other timeout mechanisms specified in the Raft algorithm may
> also not be suitable for a virtual machine environment.
> 
> I suddenly realized that for the "lease read," it uses nanotime to
> determine the duration of the lease. During a virtual machine pause, this
> value in the JVM is likely not to increase. So, it's possible that after
> the old leader's virtual machine is restored, it may still serve read
> requests, leading to the occurrence of a split-brain phenomenon. In this
> regard, perhaps setting it to an infinite value is not a good idea~
> 
> However, I strongly support the idea of introducing a separate parameter to
> distinguish it from the judgment of the "slowFollower." Maybe I can create
> an issue and submit a pull request?
> 
> Thanks
> ------------------------
> Xinyu Tan
> 
> Tsz Wo Sze <[email protected]> 于2023年10月21日周六 00:22写道:
> 
>> Hi Xinyu,
>> 
>> The JvmPauseMonitor is to monitor the local machine and try to detect if it
>> is non-responsive.  As you know, it will shut down the server when the
>> extra sleep is larger than a threshold.  The design is to detect and
>> prevent a running faulty machine since it may slow down the entire cluster.
>> 
>> I agree that the design is not suitable for a virtual machine environment.
>> (BTW, the other timeout mechanisms specified in the Raft algorithm may
>> also not be suitable for a virtual machine environment.)  As a workaround,
>> it is a good idea to set rpcSlownessTimeout to a large value for disabling
>> the auto-shutdown.  Instead of using rpcSlownessTimeout, how about we use a
>> separate conf for the threshold?  Then, it won't affect the slow follower
>> detection feature.
>> 
>> Tsz-Wo
>> 
>> 
>> On Thu, Oct 19, 2023 at 7:48 PM Xinyu Tan <[email protected]> wrote:
>> 
>>> Hello, Ratis community
>>> 
>>> I would like to understand the rationale behind a specific design detail
>> of
>>> JvmPauseMonitor. In the current code base, when JvmPauseMonitor observes
>> a
>>> JVM pause lasting over 60 seconds, it closes the RaftServerProxy in the
>>> handleJvmPause.
>>> 
>>> In our production system, some users may stop the virtual machine running
>>> the process for several minutes. When they resume the virtual machine,
>> they
>>> find that the RaftServerProxy's state is already Closed, and they must
>>> restart it to restore the correct state. This has caused operational
>>> challenges for us. I would like to know the specific reasons for this
>>> design. What problem is it meant to prevent? If there's no particular
>>> reason, we will consider adjusting the rpcSlownessTimeout to infinity in
>>> IoTDB to disable this feature.
>>> 
>>> Thanks ------------------------ Xinyu Tan
>>> 
>> 

Reply via email to