I think it's a case-by-case matter.  I don't think there's something
wrong with the Executor.submit method generally.

Looking at callers of CoreContainer.runAsync, I didn't find the core
reload use-case you speak of.  I did find DistribFileStore.delete and
looked closer.   I do see there's an issue there since it didn't
propagate exceptions as it ought to.  To fix that, I suggest the
delete code use a FutureTask and then loop the results for exceptions.

For SolrZkClient.ProcessWatchWithExecutor#process , I don't know what
you propose should be different; it's up to the Watcher to have its
error handling.  It's fundamentally an async concept.

On Tue, Jul 23, 2024 at 12:55 PM Andrey Bozhko <andyboz...@gmail.com> wrote:
>
> Hi all,
>
> I'd like to bring up for discussion how Solr handles failures of various
> background tasks.
>
>
> Typically with an ExecutorService, the task can be offloaded to a
> background thread via `execute(...)` or `submit(...)` methods:
> - if using `execute(Runnable)` method, any exception thrown by the task
> (assuming that the task doesn't have a try-catch) is intercepted and
> handled by the thread's UncaughtExceptionHandler - e.g., printed to console;
> - if using `submit(Callable<T>)` method, the caller *must* hold on to the
> future instance returned from the method, as the task result (or the task
> failure) can only be
> retrieved by invoking `get()` on the future instance.
>
>
> That said, there are quite a few places in Solr codebase where the
> background task is created by invoking `submit(...)` method but which do
> not retan any reference to the returned future.
> So if the background task fails for any reason, the failure will go
> completely unnoticed.
>
> Some of these places are:
> - CoreContainer#runAsync (
> https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L2588
> )
> - SolrZkClient.ProcessWatchWithExecutor#process (
> https://github.com/apache/solr/blob/06950c656f21577db624102b913fb659ef1f0306/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L1077-L1085
> )
>
> For example, CoreContainer#runAsync may be used to asynchronously reload
> the collection schema in certain cases - so if the reloading fails, I
> imagine the users would want to be aware of the failure and not let it go
> unnoticed.
>
>
> Does any of the above describe a real issue? Well, so far I tried searching
> the codebase for usage of ExecutorService `submit(...)` methods and
> replacing them with `execute(...)` where it makes sense - and then running
> the tests. Doing so broke 200+ tests due to uncaught exceptions in
> background threads. But I did not go through those uncaught exceptions to
> see which ones indicate a real issue and which ones are harmless.
>
> Thoughts?
>
>
> Best,
> Andrey Bozhko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to