(face-palm) -- you were very clear but I misread it; sorry Andrey! I agree with what you propose / imply: code should *not* call submit() if it ignores the returned Future; it misses the point of the submit method! execute() should be used in that case. In fact I recently proposed this very improvement with the same rationale on a PR focused on one spot. You propose to do so widely -- okay. If some spots give trouble (as you found) then at least change the others that don't and leave a TODO comment on the others; maybe linking to a JIRA.
On Thu, Jul 25, 2024 at 11:07 AM Andrey Bozhko <andyboz...@gmail.com> wrote: > > Hi David, > > In the example of SolrZkClient.ProcessWatchWithExecutor#process, it should > be possible to use ExecutorService#submit and ExecutorService#execute > methods interchangeably - because either method would run the task in the > background. So I went ahead and replaced `submit` with `execute`, and ran > the tests. The outcome was that 200+ tests broke, and reported stacktraces > like below: > > > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=68, name=zkCallback-23-thread-1, > state=RUNNABLE, group=TGRP-TestConfigSetsAPI] > > at > __randomizedtesting.SeedInfo.seed([5ECEE82AA0B50470:5D74B82B43D74CFA]:0) > > > > Caused by: > > org.apache.solr.common.SolrException: Error updating shard term for > collection: newcollection > > at __randomizedtesting.SeedInfo.seed([5ECEE82AA0B50470]:0) > > at > app//org.apache.solr.cloud.ZkShardTerms.refreshTerms(ZkShardTerms.java:377) > > at > app//org.apache.solr.cloud.ZkShardTerms.lambda$registerWatcher$8(ZkShardTerms.java:426) > > at > app//org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.lambda$process$1(SolrZkClient.java:1083) > > at > app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:363) > > at java.base@17.0.11 > /java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > > at java.base@17.0.11 > /java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > > at java.base@17.0.11/java.lang.Thread.run(Thread.java:840) > > > > Caused by: > > org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /collections/newcollection/terms/shard1 > > at > app//org.apache.zookeeper.KeeperException.create(KeeperException.java:117) > > at > app//org.apache.zookeeper.KeeperException.create(KeeperException.java:53) > > at > app//org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1972) > > at > app//org.apache.solr.common.cloud.SolrZkClient.lambda$getData$6(SolrZkClient.java:448) > > at > app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:70) > > at > app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:448) > > at > app//org.apache.solr.cloud.ZkShardTerms.refreshTerms(ZkShardTerms.java:373) > > ... 6 more > 2> NOTE: reproduce with: gradlew test --tests TestConfigSetsAPI.testUpload > -Dtests.seed=5ECEE82AA0B50470 -Dtests.locale=mer-Latn-KE > -Dtests.timezone=Asia/Magadan -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > > While the error indicates that there is some minor bug/race condition/other > issue with either ZkShardTerms#refreshTerms or the test itself, my takeaway > here is that this issue (and potentially other kinds of issues) was > unintentionally concealed because the result of ExecutorService#submit was > discarded. There is nothing wrong with the implementation of > ExecutorService#submit, but you could say that the method implies > "must-use-return-value", and the code didn't do that. > > For the schema reload scenario, please refer to the existing test > "org.apache.solr.pkg.TestPackages.testSchemaPlugins". In that scenario, > uploading the new package version to the package store triggers reloading > of the schema: > - > https://github.com/apache/solr/blob/661b1dac2284ab556573605ae81a4951a5703c49/solr/core/src/java/org/apache/solr/core/SolrResourceLoader.java#L975-L983 > - > https://github.com/apache/solr/blob/661b1dac2284ab556573605ae81a4951a5703c49/solr/core/src/java/org/apache/solr/pkg/PackageListeners.java#L82 > > This test also has a similarly concealed issue, which could be revealed by > replacing `submit` method with `execute` in CoreContainer#runAsync. > > As for the next steps, I agree with Jason that we could start by auditing > the usage of ExecutorService#submit in the codebase, to see if there are > any more concealed issues and address them. > > I don't think that there's a need to drastically change any exception > handling/logging in Solr. It may suffice to just discourage/disallow the > pattern where the code discards the result of ExecutorService#submit - but > we can definitely discuss more. > > Best, > Andrey Bozhko --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org