If there's anyone I need to target specifically with this email, I think it's anyone who has a good working mental map of SolrCloud internals -- how everything interacts with ZK and between multiple nodes.  Erick deserves special mention because he's down in the test trenches frequently, slogging through the scary places.

There are a number of Solr tests, particularly those having to to with SolrCloud, that take a long time even on a particularly good test run.  A bunch of them that take a minute or longer seem to involve ZooKeeper in some way, or exercise some other part of SolrCloud.

Here's some examples of the outliers:

   [junit4] Suite: org.apache.solr.cloud.api.collections.ShardSplitTest
   [junit4] Completed [421/829] on J2 in 464.11s, 10 tests

   [junit4] Suite: org.apache.solr.cloud.BasicDistributedZkTest
   [junit4] Completed [446/829] on J0 in 518.82s, 1 test

I'm wondering how much of the time on these long-running cloud tests is spent waiting for 15-60 second timeouts rather than actually executing test code.  Could we possibly speed some of these tests up just by adjusting timeouts to lower values?  My thought is that if a subsystem failure is expected as part of a test, why not expedite things so it happens in 5 seconds or less, instead of waiting 30 or 60 seconds?  Maybe just make this change on tests where we actually do expect timeouts to be exceeded, not tests where everything is supposed to work correctly.

I know that we won't be able to speed up EVERY test in this way.  The timeouts default to such long values because there have been observable situations in the wild where short timeouts just aren't enough.  But if the idea has merit at all, I think there might be an opportunity to substantially speed up an overall test run.

Is this idea completely insane?

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to