We have scripts that use the Solr Replica management APIs. The scripts use the async parameter and poll for it to be finished.
Fairly regularly the DELETEREPLICA action will *never* finish. I have eventually enabled enough logging to see that it is spinning on this: > INFO (parallelCoreAdminExecutor-19-thread-4-processing-n:myHost:8984_solr x:my_colleciton_shard105_0_replica_n2695 OFYOHGJY3554330096761208 UNLOAD) [ ] o.a.s.c.SolrCore Core my_colleciton_shard105_0_replica_n2695 is not yet closed, waiting 100 ms before checking again. We have left this for tens of MINUTES (I see a recent example in our logs of this spinning for 25 minutes) without it progressing on its own. When we notice this we have restart the Solr process, which seems to correct the state for practical purposes and move on. This manual intervention is very painful. The log statement appears to come from the SolrCore class, in the closeAndWait <https://github.com/apache/solr/blob/33b74e65caf46062737bbc6bc3507a39b1049f67/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1536-L1539> method (called by unload method). It has a while loop checking for `isClosed`. And isClosed just checks if references are 0. So the question is what could cause references to not go to zero for such a long period of time? Any way to get visibility on what references are remaining? Is this a known or documented issue anywhere? Thanks