[
https://issues.apache.org/jira/browse/SOLR-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316873#comment-15316873
]
Hoss Man commented on SOLR-9189:
--------------------------------
My initial gut paranoia skimming the jenkins emails this morning was to assume
that this might be because of SOLR-5776 -- the hypothosis being: "The increased
randomized use of ssl (factoring in tests.nightly / tests.multiplier) is
causing more tests to slow down due to the crypto calculations"
... but that hypothosis seems weak when i started looking at the logs -- there
is a "Randomized ssl" line as part of the logs for every SolrTestCaseJ4
subclass showing if ssl is being used or not...
* http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/834/
** 25 test failures
** only 7 of those were using ssl
* https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1034/
** 44 test failures
** only 17 of those were using ssl
...even if we assume every test failure where ssl was in use was directly
caused by ssl, that still leaves a really high increase in the number of failed
tests in those two runs.
So my ammended (paranoid) hypothosis is "The increased randomized use of ssl
(factoring in tests.nightly / tests.multiplier) is causing more tests to slow
down due to the crypto calculations *EVEN IN OTHER TESTS AT THE SAME TIME DUE
TO CPU STARVATION*"
I'm going to commit a blanket disable of all SSL randomization _on master_ ASAP
to test this hypothosis.
Part of me feels like this is an overkill reaction, and that a more rational
response would simply be to undo the "increased odds of using ssl" portion of
SOLR-5776 -- but I'd really like to get a difinitive understanding of wether
SSL usage is really having such a seriously pronounced affect on other tests in
the same jenkins run -- OR -- *is it just a red herring, and some other recent
change has caused serious timeout issues?*
> explosion of timeout related failures in jenkins the past few days
> ------------------------------------------------------------------
>
> Key: SOLR-9189
> URL: https://issues.apache.org/jira/browse/SOLR-9189
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
> Assignee: Hoss Man
> Priority: Critical
>
> In the past few days, something has gone seriously wonky with our jenkins
> tests -- causing a serious explosion in the number of test failures --
> notably do to various sorts of timeouts...
> * "Unable to create core ... Timed out getting coreNodeName for ..."
> * "msg=SolrCore is loading,code=503"
> * "Timeout occured while waiting response from server"
> * "No registered leader was found after waiting for 30000ms"
> * "Unable to create core ... Caused by: Timed out getting shard id for core:
> ..."
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]