I think it's a case-by-case matter. I don't think there's something wrong with the Executor.submit method generally.
Looking at callers of CoreContainer.runAsync, I didn't find the core reload use-case you speak of. I did find DistribFileStore.delete and looked closer. I do see there's an issue there since it didn't propagate exceptions as it ought to. To fix that, I suggest the delete code use a FutureTask and then loop the results for exceptions. For SolrZkClient.ProcessWatchWithExecutor#process , I don't know what you propose should be different; it's up to the Watcher to have its error handling. It's fundamentally an async concept. On Tue, Jul 23, 2024 at 12:55 PM Andrey Bozhko <andyboz...@gmail.com> wrote: > > Hi all, > > I'd like to bring up for discussion how Solr handles failures of various > background tasks. > > > Typically with an ExecutorService, the task can be offloaded to a > background thread via `execute(...)` or `submit(...)` methods: > - if using `execute(Runnable)` method, any exception thrown by the task > (assuming that the task doesn't have a try-catch) is intercepted and > handled by the thread's UncaughtExceptionHandler - e.g., printed to console; > - if using `submit(Callable<T>)` method, the caller *must* hold on to the > future instance returned from the method, as the task result (or the task > failure) can only be > retrieved by invoking `get()` on the future instance. > > > That said, there are quite a few places in Solr codebase where the > background task is created by invoking `submit(...)` method but which do > not retan any reference to the returned future. > So if the background task fails for any reason, the failure will go > completely unnoticed. > > Some of these places are: > - CoreContainer#runAsync ( > https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L2588 > ) > - SolrZkClient.ProcessWatchWithExecutor#process ( > https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L1077-L1085 > ) > > For example, CoreContainer#runAsync may be used to asynchronously reload > the collection schema in certain cases - so if the reloading fails, I > imagine the users would want to be aware of the failure and not let it go > unnoticed. > > > Does any of the above describe a real issue? Well, so far I tried searching > the codebase for usage of ExecutorService `submit(...)` methods and > replacing them with `execute(...)` where it makes sense - and then running > the tests. Doing so broke 200+ tests due to uncaught exceptions in > background threads. But I did not go through those uncaught exceptions to > see which ones indicate a real issue and which ones are harmless. > > Thoughts? > > > Best, > Andrey Bozhko --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org