If there's anyone I need to target specifically with this email, I think
it's anyone who has a good working mental map of SolrCloud internals --
how everything interacts with ZK and between multiple nodes. Erick
deserves special mention because he's down in the test trenches
frequently, slogging through the scary places.
There are a number of Solr tests, particularly those having to to with
SolrCloud, that take a long time even on a particularly good test run.
A bunch of them that take a minute or longer seem to involve ZooKeeper
in some way, or exercise some other part of SolrCloud.
Here's some examples of the outliers:
[junit4] Suite: org.apache.solr.cloud.api.collections.ShardSplitTest
[junit4] Completed [421/829] on J2 in 464.11s, 10 tests
[junit4] Suite: org.apache.solr.cloud.BasicDistributedZkTest
[junit4] Completed [446/829] on J0 in 518.82s, 1 test
I'm wondering how much of the time on these long-running cloud tests is
spent waiting for 15-60 second timeouts rather than actually executing
test code. Could we possibly speed some of these tests up just by
adjusting timeouts to lower values? My thought is that if a subsystem
failure is expected as part of a test, why not expedite things so it
happens in 5 seconds or less, instead of waiting 30 or 60 seconds?
Maybe just make this change on tests where we actually do expect
timeouts to be exceeded, not tests where everything is supposed to work
correctly.
I know that we won't be able to speed up EVERY test in this way. The
timeouts default to such long values because there have been observable
situations in the wild where short timeouts just aren't enough. But if
the idea has merit at all, I think there might be an opportunity to
substantially speed up an overall test run.
Is this idea completely insane?
Thanks,
Shawn
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]