Hoss Man created SOLR-11258:
-------------------------------

             Summary: ChaosMonkeySafeLeaderWithPullReplicasTest fails a lot & 
reproducibly:  The Monkey ran for over 45 seconds and no jetties were stopped - 
this is worth investigating!
                 Key: SOLR-11258
                 URL: https://issues.apache.org/jira/browse/SOLR-11258
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man


Between June21 & Aug18, there have been 18 failures like this...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=ChaosMonkeySafeLeaderWithPullReplicasTest -Dtests.method=test 
-Dtests.seed=7669B63E9E4D1685 -Dtests.nightly=true -Dtests.slow=true 
-Dtests.locale=pa-Guru -Dtests.timezone=Europe/Podgorica -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 82.4s | ChaosMonkeySafeLeaderWithPullReplicasTest.test <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: The Monkey ran for 
over 45 seconds and no jetties were stopped - this is worth investigating!
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([7669B63E9E4D1685:FE3D89E430B17B7D]:0)
   [junit4]    >        at 
org.apache.solr.cloud.ChaosMonkey.stopTheMonkey(ChaosMonkey.java:587)
   [junit4]    >        at 
org.apache.solr.cloud.ChaosMonkeySafeLeaderWithPullReplicasTest.test(ChaosMonkeySafeLeaderWithPullReplicasTest.java:174)
   [junit4]    >        at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:993)
   [junit4]    >        at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:968)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
{noformat}

In my own testing, when these failures happen, the seeds reproduce - suggesting 
the problem is logic flaw in the test that can can happen by chance.

Perhaps the ChaosMonkey needs to be changed to get more aggressive about 
stopping nodes bsaed on how long it's been since hte last time it stopped a 
node?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to