Hey James - thanks for looking into this! > a straight-forward solution is to set the system property > "solr.solrj.cache.timeout.sec" to like 60 in the test.
Idk - even if we can make the assertions pass, what is the test really validating? The fundamental assumption it makes about certain types of requests triggering synchronous CLUSTERSTATUS updates is now false. You could argue that having it continue to pass in its current form is misleading. I hear David's concern that the test might detect a future perf regression, but there are other test-cases in ClusterStateProviderTest that cover this gap even if testHttpCspPerf goes away. (To clarify - not a "-1" either way, just offering my 2c. Thanks again for taking this on James!) Best, Jason On Mon, Oct 6, 2025 at 8:13 PM David Smiley <[email protected]> wrote: > > -1 to simply remove that test. It showed we found and fixed a critical > performance problem that could easily re-appear. > > I imagine a straight-forward solution is to set the system property > "solr.solrj.cache.timeout.sec" to like 60 in the test. > > Ideally that new live node cache refresh mechanism will be improved to a > different design that doesn't ping Solr all day long. Staleness of > collection state isn't handled that way. > > ~ David > > On Mon, Oct 6, 2025 at 6:39 PM James Dyer <[email protected]> wrote: > > > I am looking at the recent failures with > > CloudHttp2SolrClientTest#testHttpCspPerf. This test was added with > > SOLR-14985, which fixed a performance regression because CSC wasn't > > caching its state. This test counts the number of logged requests for > > CLUSTERSTATUS. However, with the recent commit on 9/22 for SOLR-17921 > > ("BaseHttpClusterStateProvider should prefetch refreshes of > > liveNodes"), we now have a background task periodically calling > > CLUSTESTATUS for us. Although the test failure does not reproduce for > > me without modifications, I can easily make it happen by adding a > > Thead.sleep to the middle of the test class. > > > > Perhaps SOLR-17921 does not make the worries about performance > > regression here entirely go away, but it for sure makes it hard to > > verify we are not calling CUSTERSTATUS too many times! My inclination > > here is to entirely remove the test method "testHttpCspPerf" as it > > seems less-relevant and less-verifiable. Unless, someone here has a > > good idea of how we can still verify this without the test being > > flaky? > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
