[
https://issues.apache.org/jira/browse/SOLR-13236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763937#comment-16763937
]
Hoss Man commented on SOLR-13236:
---------------------------------
Examples of some of the types of failures i've observed in jenkins logs...
----
This error occurs inside of a catch block while trying to log some info about
the state of hte election when the Error/Exception happened. The original
exception is completely lost in the logs because of this
IllegalArgumentException, which arises from calling zkClient().getChildren() on
the hardcoded string
{{"/collections/allReplicasInLIR/leader_elect/shard1/election/"}} -- which as
the error indicates is completley illegal, and indicates that this code path
was never sanity checked when the test was written.
{noformat}
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=LIROnShardRestartTest -Dtests.method=testAllReplicasInLIR
-Dtests.seed=10B31070AB4A4496 -Dtests.multiplier=2 -Dtests.nightly=true
-Dtests.slow=true -Dtests.badapples=true
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-BadApples-NightlyTests-7.x/test-data/enwiki.random.lines.txt
-Dtests.locale=sv-SE -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
[junit4] ERROR 144s J2 | LIROnShardRestartTest.testAllReplicasInLIR <<<
[junit4] > Throwable #1: java.lang.IllegalArgumentException: Path must
not end with / character
[junit4] > at
__randomizedtesting.SeedInfo.seed([10B31070AB4A4496:4A2B2AB6D5CA2371]:0)
[junit4] > at
org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:58)
[junit4] > at
org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1523)
[junit4] > at
org.apache.solr.common.cloud.SolrZkClient.lambda$getChildren$4(SolrZkClient.java:346)
[junit4] > at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)
[junit4] > at
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:346)
[junit4] > at
org.apache.solr.cloud.LIROnShardRestartTest.testAllReplicasInLIR(LIROnShardRestartTest.java:168)
[junit4] > at java.lang.Thread.run(Thread.java:748)
{noformat}
This is a failure in the last line of the test, after all assertions ahve
passed, to delete the collection -- i believe because the checks that " waiting
for replicas rejoin election" doesn't first wait to see all the nodes
disconnected from jetty and be marged "down" -- so the election may not have
even happened yet by the time the test finishes, it may just be getting to the
point where all the solr nodes are marked "down" when it tries to clean up...
{noformat}
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=LIROnShardRestartTest -Dtests.method=testAllReplicasInLIR
-Dtests.seed=10B31070AB4A4496 -Dtests.multiplier=2 -Dtests.nightly=true
-Dtests.slow=true -Dtests.badapples=true
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-BadApples-NightlyTests-7.x/test-data/enwiki.random.lines.txt
-Dtests.locale=sv-SE -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
[junit4] ERROR 94.6s J1 | LIROnShardRestartTest.testAllReplicasInLIR <<<
[junit4] > Throwable #1:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available
to handle this request
[junit4] > at
__randomizedtesting.SeedInfo.seed([10B31070AB4A4496:4A2B2AB6D5CA2371]:0)
[junit4] > at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:461)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1110)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
[junit4] > at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
[junit4] > at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
[junit4] > at
org.apache.solr.cloud.LIROnShardRestartTest.testAllReplicasInLIR(LIROnShardRestartTest.java:175)
[junit4] > at java.lang.Thread.run(Thread.java:748)
{noformat}
This is a (similar) failure in the first line of another test method to create
the collection it wants to use, which can happen if the former test fails (or
passes) and the next test method is started before all the nodes have a chance
to re-connect to zk...
{noformat}
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=LIROnShardRestartTest -Dtests.method=testSeveralReplicasInLIR
-Dtests.seed=10B31070AB4A4496 -Dtests.multiplier=2 -Dtests.nightly=true
-Dtests.slow=true -Dtests.badapples=true
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-BadApples-NightlyTests-7.x/test-data/enwiki.random.lines.txt
-Dtests.locale=sv-SE -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
[junit4] ERROR 0.60s J1 | LIROnShardRestartTest.testSeveralReplicasInLIR
<<<
[junit4] > Throwable #1:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available
to handle this request
[junit4] > at
__randomizedtesting.SeedInfo.seed([10B31070AB4A4496:96E987448B1009CF]:0)
[junit4] > at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:461)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1110)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
[junit4] > at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
[junit4] > at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
[junit4] > at
org.apache.solr.cloud.LIROnShardRestartTest.testSeveralReplicasInLIR(LIROnShardRestartTest.java:190)
[junit4] > at java.lang.Thread.run(Thread.java:748)
{noformat}
Here is another error showing how the effects of one test method may not be
adequately cleaned up by the time the next test method starts...
{noformat}
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=LIROnShardRestartTest -Dtests.method=testSeveralReplicasInLIR
-Dtests.seed=10B31070AB4A4496 -Dtests.multiplier=2 -Dtests.nightly=true
-Dtests.slow=true -Dtests.badapples=true
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-BadApples-NightlyTests-7.x/test-data/enwiki.random.lines.txt
-Dtests.locale=sv-SE -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
[junit4] ERROR 0.54s J2 | LIROnShardRestartTest.testSeveralReplicasInLIR
<<<
[junit4] > Throwable #1:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://127.0.0.1:44441/solr: Cannot create collection
severalReplicasInLIR. Value of maxShardsPerNode is 1, and the number of nodes
currently live or live and part of your createNodeSet is 2. This allows a
maximum of 2 to be created. Value of numShards is 1, value of nrtReplicas is 3,
value of tlogReplicas is 0 and value of pullReplicas is 0. This requires 3
shards to be created (higher than the allowed number)
[junit4] > at
__randomizedtesting.SeedInfo.seed([10B31070AB4A4496:96E987448B1009CF]:0)
[junit4] > at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
[junit4] > at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
[junit4] > at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
[junit4] > at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:484)
[junit4] > at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:414)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1110)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
[junit4] > at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
[junit4] > at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
[junit4] > at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
[junit4] > at
org.apache.solr.cloud.LIROnShardRestartTest.testSeveralReplicasInLIR(LIROnShardRestartTest.java:190)
[junit4] > at java.lang.Thread.run(Thread.java:748)
{noformat}
> numerous problems with LIROnShardRestartTest
> --------------------------------------------
>
> Key: SOLR-13236
> URL: https://issues.apache.org/jira/browse/SOLR-13236
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Priority: Major
>
> LIROnShardRestartTest is a frequent cause of jenkins failures -- but only on
> the 7x jenkins jobs, because it was removed from master/8x as part of
> SOLR-11812 since the underlying implementation being tested was deprecated
> and removed in 8x.
> I spent some time looking into trying to fix this test, but the amount of
> work it appears it would take to fix doesn't seem worth the effort given it's
> deprecated status. so i'm filing this issue purely for tracking purposes
> with the plan to disable the test and resolve this jira as "Won't Fix" -- if
> anyone else is intereste in working on it they can feel free to re-open
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]