(face-palm) -- you were very clear but I misread it; sorry Andrey!

I agree with what you propose / imply: code should *not* call submit()
if it ignores the returned Future; it misses the point of the submit
method!  execute() should be used in that case.  In fact I recently
proposed this very improvement with the same rationale on a PR focused
on one spot.  You propose to do so widely -- okay.  If some spots give
trouble (as you found) then at least change the others that don't and
leave a TODO comment on the others; maybe linking to a JIRA.

On Thu, Jul 25, 2024 at 11:07 AM Andrey Bozhko <andyboz...@gmail.com> wrote:
>
> Hi David,
>
> In the example of SolrZkClient.ProcessWatchWithExecutor#process, it should
> be possible to use ExecutorService#submit and ExecutorService#execute
> methods interchangeably - because either method would run the task in the
> background. So I went ahead and replaced `submit` with `execute`, and ran
> the tests. The outcome was that 200+ tests broke, and reported stacktraces
> like below:
>
> > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an
> uncaught exception in thread: Thread[id=68, name=zkCallback-23-thread-1,
> state=RUNNABLE, group=TGRP-TestConfigSetsAPI]
>  >     at
> __randomizedtesting.SeedInfo.seed([5ECEE82AA0B50470:5D74B82B43D74CFA]:0)
>  >
>  >     Caused by:
>  >     org.apache.solr.common.SolrException: Error updating shard term for
> collection: newcollection
>  >         at __randomizedtesting.SeedInfo.seed([5ECEE82AA0B50470]:0)
>  >         at
> app//org.apache.solr.cloud.ZkShardTerms.refreshTerms(ZkShardTerms.java:377)
>  >         at
> app//org.apache.solr.cloud.ZkShardTerms.lambda$registerWatcher$8(ZkShardTerms.java:426)
>  >         at
> app//org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.lambda$process$1(SolrZkClient.java:1083)
>  >         at
> app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:363)
>  >         at java.base@17.0.11
> /java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  >         at java.base@17.0.11
> /java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  >         at java.base@17.0.11/java.lang.Thread.run(Thread.java:840)
>  >
>  >         Caused by:
>  >         org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/newcollection/terms/shard1
>  >             at
> app//org.apache.zookeeper.KeeperException.create(KeeperException.java:117)
>  >             at
> app//org.apache.zookeeper.KeeperException.create(KeeperException.java:53)
>  >             at
> app//org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1972)
>  >             at
> app//org.apache.solr.common.cloud.SolrZkClient.lambda$getData$6(SolrZkClient.java:448)
>  >             at
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:70)
>  >             at
> app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:448)
>  >             at
> app//org.apache.solr.cloud.ZkShardTerms.refreshTerms(ZkShardTerms.java:373)
>  >             ... 6 more
> 2> NOTE: reproduce with: gradlew test --tests TestConfigSetsAPI.testUpload
> -Dtests.seed=5ECEE82AA0B50470 -Dtests.locale=mer-Latn-KE
> -Dtests.timezone=Asia/Magadan -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8
>
> While the error indicates that there is some minor bug/race condition/other
> issue with either ZkShardTerms#refreshTerms or the test itself, my takeaway
> here is that this issue (and potentially other kinds of issues) was
> unintentionally concealed because the result of ExecutorService#submit was
> discarded. There is nothing wrong with the implementation of
> ExecutorService#submit, but you could say that the method implies
> "must-use-return-value", and the code didn't do that.
>
> For the schema reload scenario, please refer to the existing test
> "org.apache.solr.pkg.TestPackages.testSchemaPlugins". In that scenario,
> uploading the new package version to the package store triggers reloading
> of the schema:
> -
> https://github.com/apache/solr/blob/661b1dac2284ab556573605ae81a4951a5703c49/solr/core/src/java/org/apache/solr/core/SolrResourceLoader.java#L975-L983
> -
> https://github.com/apache/solr/blob/661b1dac2284ab556573605ae81a4951a5703c49/solr/core/src/java/org/apache/solr/pkg/PackageListeners.java#L82
>
> This test also has a similarly concealed issue, which could be revealed by
> replacing `submit` method with `execute` in CoreContainer#runAsync.
>
> As for the next steps, I agree with Jason that we could start by auditing
> the usage of ExecutorService#submit in the codebase, to see if there are
> any more concealed issues and address them.
>
> I don't think that there's a need to drastically change any exception
> handling/logging in Solr. It may suffice to just discourage/disallow the
> pattern where the code discards the result of ExecutorService#submit - but
> we can definitely discuss more.
>
> Best,
> Andrey Bozhko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to