[ 
https://issues.apache.org/jira/browse/SOLR-18155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066436#comment-18066436
 ] 

Pierre Salagnac commented on SOLR-18155:
----------------------------------------

After a second look, I don't know why this seed causes issues more often than 
others. But I don't locally have a 100% repro with this seed, it's more around 
80%. Also, I saw some other test classes sometimes failing with the same error. 
Also, no idea why this is new (assuming it is!)

My understanding is {{CollectionsAPISolrJTest}} reproduces the issue often 
because it creates many collections and does not delete them. The 
{{MiniSolrCloudCluster}} is shutdown at end of test class, which shutdown nodes 
one by one, but the collections are still live. This is not a problem by 
itself, it just triggers leader election very late, after the node started the 
shutdown sequence.

 

More detailed scenario:
 # The test runs and creates many collections/shards...
 # At end of test class, we invoke {{{}MiniSolrCloudCluster.shutdown(){}}}, 
which concurrently invokes {{stopJettySolrRunner()}} for all the nodes.
 # Node _A_ initiates its shutdown sequence. In 
{{{}ZkController.preClose(){}}}, we invoke {{zkCollectionTerms::close}} which 
will close all the existing instances of {{{}ZkShardTerms{}}}.
 # Node _B_ does the same (concurrently).
 # Node _A_ invokes {{ZkController.tryCancelAllElections()}} which removes all 
the ephemeral nodes for shard leader elections.
 # Before node _B_ completes the same, one of the replicas on node _B_ is 
elected as new leader (because of step 5). This causes a new instance of 
{{ZkShardTerms}} to be created on node {_}B{_}. It won't be closed because step 
4 is already done.

> CollectionsAPISolrJTest seed reliably leaks unclosed ZkShardTerms
> -----------------------------------------------------------------
>
>                 Key: SOLR-18155
>                 URL: https://issues.apache.org/jira/browse/SOLR-18155
>             Project: Solr
>          Issue Type: Task
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> The following seed reliably fails on {{main}} (as of 
> {{684894af5f1591af1c49c2bb6fdfdd83a94a89b2}}) due to ObjectReleaseTracker's 
> of ZkShardTerms...
> {noformat}
> ./gradlew :solr:solrj:test --tests 
> "org.apache.solr.client.solrj.impl.CloudSolrClientCacheTest.testStaleStateRetryWaitsAfterSkipFailure"
>  "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC 
> -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" 
> -Ptests.seed=517946C27016E5DC -Ptests.useSecurityManager=true 
> -Ptests.file.encoding=ISO-8859-1
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to