[ 
https://issues.apache.org/jira/browse/RATIS-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved RATIS-1695.
-------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

The pull request #747 is now merged.  Thanks, [~jiacheliu3]!

Could you file any JIRA for the remaining work?

> Use a Builder for Daemon
> ------------------------
>
>                 Key: RATIS-1695
>                 URL: https://issues.apache.org/jira/browse/RATIS-1695
>             Project: Ratis
>          Issue Type: Improvement
>          Components: server
>            Reporter: Jiacheng Liu
>            Assignee: Jiacheng Liu
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: 733_review.patch
>
>          Time Spent: 6h
>  Remaining Estimate: 0h
>
> In Ratis many threads are created using `Daemon` class manually. For threads 
> like this, if there's an uncaught exception, the thread will just crash 
> silently without other components knowing. If the thread happens to be a 
> critical component then some part of the RaftServer is essentially down, 
> whereas the RaftServer's lifecycle is still RUNNING (not set to EXCEPTION 
> because the thread didn't have a chance).
> One example where this can happen is 
> [https://github.com/apache/ratis/pull/417/files] Before this change is in, 
> the StateMachineUpdater thread can throw NPE and exit, so the follower 
> RaftServer stays stale forever. The RaftServer's lifecycle is RUNNING and 
> there's no way for the external party to know by 
> `RaftServer.getLifeCycleState()`.
> The proposal is to improve observability on RaftServer to ensure an uncaught 
> exception can be caught and propagated to the external user, by multiple 
> folds:
>  # For all `Daemon` threads, they should have UncaughtExceptionHandler set.
>  # Add an extra field to the RaftServer to store an exception, and that field 
> can be set by the UncaughtExceptionHandler instances.
>  # The UncaughtExceptionHandler also transitions the RaftServer to EXCEPTION 
> state.
> So external users canĀ 
> {code:java}
> RaftServer server = RaftServer.newBuilder().build();
> // Periodically check
> if (server.getLifeCycleState() == State.EXCEPTION) {
>   Throwable t = server.getError();
>   // Deal with the throwable
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to