Hi all, I'd like to bring up for discussion how Solr handles failures of various background tasks.
Typically with an ExecutorService, the task can be offloaded to a background thread via `execute(...)` or `submit(...)` methods: - if using `execute(Runnable)` method, any exception thrown by the task (assuming that the task doesn't have a try-catch) is intercepted and handled by the thread's UncaughtExceptionHandler - e.g., printed to console; - if using `submit(Callable<T>)` method, the caller *must* hold on to the future instance returned from the method, as the task result (or the task failure) can only be retrieved by invoking `get()` on the future instance. That said, there are quite a few places in Solr codebase where the background task is created by invoking `submit(...)` method but which do not retan any reference to the returned future. So if the background task fails for any reason, the failure will go completely unnoticed. Some of these places are: - CoreContainer#runAsync ( https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L2588 ) - SolrZkClient.ProcessWatchWithExecutor#process ( https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L1077-L1085 ) For example, CoreContainer#runAsync may be used to asynchronously reload the collection schema in certain cases - so if the reloading fails, I imagine the users would want to be aware of the failure and not let it go unnoticed. Does any of the above describe a real issue? Well, so far I tried searching the codebase for usage of ExecutorService `submit(...)` methods and replacing them with `execute(...)` where it makes sense - and then running the tests. Doing so broke 200+ tests due to uncaught exceptions in background threads. But I did not go through those uncaught exceptions to see which ones indicate a real issue and which ones are harmless. Thoughts? Best, Andrey Bozhko