Exception handling in background tasks

Andrey Bozhko Tue, 23 Jul 2024 09:55:38 -0700

Hi all,

I'd like to bring up for discussion how Solr handles failures of various
background tasks.



Typically with an ExecutorService, the task can be offloaded to a
background thread via `execute(...)` or `submit(...)` methods:
- if using `execute(Runnable)` method, any exception thrown by the task
(assuming that the task doesn't have a try-catch) is intercepted and
handled by the thread's UncaughtExceptionHandler - e.g., printed to console;
- if using `submit(Callable<T>)` method, the caller *must* hold on to the
future instance returned from the method, as the task result (or the task
failure) can only be
retrieved by invoking `get()` on the future instance.


That said, there are quite a few places in Solr codebase where the
background task is created by invoking `submit(...)` method but which do
not retan any reference to the returned future.
So if the background task fails for any reason, the failure will go
completely unnoticed.

Some of these places are:
- CoreContainer#runAsync (
https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L2588
)
- SolrZkClient.ProcessWatchWithExecutor#process (
https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L1077-L1085
)

For example, CoreContainer#runAsync may be used to asynchronously reload
the collection schema in certain cases - so if the reloading fails, I
imagine the users would want to be aware of the failure and not let it go
unnoticed.


Does any of the above describe a real issue? Well, so far I tried searching
the codebase for usage of ExecutorService `submit(...)` methods and
replacing them with `execute(...)` where it makes sense - and then running
the tests. Doing so broke 200+ tests due to uncaught exceptions in
background threads. But I did not go through those uncaught exceptions to
see which ones indicate a real issue and which ones are harmless.

Thoughts?


Best,
Andrey Bozhko

Exception handling in background tasks

Reply via email to