[
https://issues.apache.org/jira/browse/RATIS-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze resolved RATIS-1695.
-------------------------------
Fix Version/s: 3.0.0
Resolution: Fixed
The pull request #747 is now merged. Thanks, [~jiacheliu3]!
Could you file any JIRA for the remaining work?
> Use a Builder for Daemon
> ------------------------
>
> Key: RATIS-1695
> URL: https://issues.apache.org/jira/browse/RATIS-1695
> Project: Ratis
> Issue Type: Improvement
> Components: server
> Reporter: Jiacheng Liu
> Assignee: Jiacheng Liu
> Priority: Major
> Fix For: 3.0.0
>
> Attachments: 733_review.patch
>
> Time Spent: 6h
> Remaining Estimate: 0h
>
> In Ratis many threads are created using `Daemon` class manually. For threads
> like this, if there's an uncaught exception, the thread will just crash
> silently without other components knowing. If the thread happens to be a
> critical component then some part of the RaftServer is essentially down,
> whereas the RaftServer's lifecycle is still RUNNING (not set to EXCEPTION
> because the thread didn't have a chance).
> One example where this can happen is
> [https://github.com/apache/ratis/pull/417/files] Before this change is in,
> the StateMachineUpdater thread can throw NPE and exit, so the follower
> RaftServer stays stale forever. The RaftServer's lifecycle is RUNNING and
> there's no way for the external party to know by
> `RaftServer.getLifeCycleState()`.
> The proposal is to improve observability on RaftServer to ensure an uncaught
> exception can be caught and propagated to the external user, by multiple
> folds:
> # For all `Daemon` threads, they should have UncaughtExceptionHandler set.
> # Add an extra field to the RaftServer to store an exception, and that field
> can be set by the UncaughtExceptionHandler instances.
> # The UncaughtExceptionHandler also transitions the RaftServer to EXCEPTION
> state.
> So external users canĀ
> {code:java}
> RaftServer server = RaftServer.newBuilder().build();
> // Periodically check
> if (server.getLifeCycleState() == State.EXCEPTION) {
> Throwable t = server.getError();
> // Deal with the throwable
> }{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)