[ 
https://issues.apache.org/jira/browse/SOLR-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6231:
----------------------------------------

    Attachment: SOLR-6231.patch

The good thing about this failure is that in all instances I've seen, we always 
have an overseer. It's just that the overseer is not one of the designates. I 
looked at the logs of a few failures and it seemed like the re-prioritization 
was in process and we timed out early.

Here's a patch to harden the process. We have a max timeout of 300 seconds and 
a smaller 60 second timeout for finding designates which is adjusted further 
and further ahead as we find new overseers being elected. The idea is that if 
within 60 seconds, the overseer hasn't changed, then we're likely not going to 
find a new overseer and we should stop. But if the overseer changed then 
re-prioritization is in progress and we should wait more.

> RollingRestartTest failures on jenkins
> --------------------------------------
>
>                 Key: SOLR-6231
>                 URL: https://issues.apache.org/jira/browse/SOLR-6231
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud, Tests
>            Reporter: Shalin Shekhar Mangar
>             Fix For: 4.10
>
>         Attachments: SOLR-6231.patch
>
>
> A somewhat rare fail on jenkins. An overseer was available to service 
> requests but even after waiting for 60 seconds, none of the designates were 
> the overseer.
> {code}
> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/4081/
> Java: 32bit/jdk1.8.0_20-ea-b21 -client -XX:+UseSerialGC
> 1 tests failed.
> REGRESSION:  org.apache.solr.cloud.RollingRestartTest.testDistribSearch
> Error Message:
> No overseer designate as leader found after restart #3: 127.0.0.1:60996_
> Stack Trace:
> java.lang.AssertionError: No overseer designate as leader found after restart 
> #3: 127.0.0.1:60996_
>         at 
> __randomizedtesting.SeedInfo.seed([5263BF570390CF79:D385314F74CFAF45]:0)
>         at org.junit.Assert.fail(Assert.java:93)
>         at 
> org.apache.solr.cloud.RollingRestartTest.restartWithRolesTest(RollingRestartTest.java:100)
>         at 
> org.apache.solr.cloud.RollingRestartTest.doTest(RollingRestartTest.java:61)
>         at 
> org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to