[ https://issues.apache.org/jira/browse/SOLR-17764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17979336#comment-17979336 ]
Mark Robert Miller commented on SOLR-17764: ------------------------------------------- I don't have the code in front of me, so maybe you changed this or was changed, but from my memory: • JettySolrRunner.stop() short-circuits the normal Jetty life-cycle so that unit tests finish quickly • it explicitly calls coreContainer.shutdown() before it invokes Server.stop(); • it sets server.setStopTimeout(0) so Jetty never blocks waiting for in-flight requests. • Lots of tests may never add the StatisticsHandler? In which case, tests in general would not be testing graceful shutdown and would expect to hit a 503 or random issue due to something being closed depending on races / how peppered that is closed check is in the code. 503 should mean retry: that won't bullet proof that test if its counting on a request finishing after cluster shutdown or a whole shard is shutdown, but should be fairly bullet proof for a single instance or all instances in a shard but one getting shutdown. > "graceful" jetty shutdown causes ChaosMonkeySafeLeaderWithPullReplicasTest > failures > ----------------------------------------------------------------------------------- > > Key: SOLR-17764 > URL: https://issues.apache.org/jira/browse/SOLR-17764 > Project: Solr > Issue Type: Bug > Reporter: Chris M. Hostetter > Priority: Major > Attachments: > E7F93005B9386058.OUTPUT-org.apache.solr.cloud.ChaosMonkeySafeLeaderWithPullReplicasTest.txt > > > Reviewing recent jenkins test failure metrics, I noticed that (Nightly) test > ChaosMonkeySafeLeaderWithPullReplicasTest started failing ~60% of the time > right around the time that SOLR-17744 was committed. > Things i have observed: > * Seeds from failing runs seem to reliably reproduce the failure > ** These failures do *NOT* reproduce if i revert to just before SOLR-17744 > * Ad-hoc testing I've done of seeds that do _not_ fail on first attempt seem > to reliably succeed on all subsequent attempts > ** Suggesting that the root cause is something deterministic in the > {{{}random(){}}}-ness of the test, and not something dependent on timing or > concurrency. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org