There is good testing that Solr closes most things that should be closed including cores. Still... I could see UNLOAD being enhanced to insist the core be closed after a few minutes.
On Tue, Dec 10, 2024 at 2:17 PM Zack Kendall <zachariahkend...@gmail.com> wrote: > We have scripts that use the Solr Replica management APIs. The scripts use > the async parameter and poll for it to be finished. > > Fairly regularly the DELETEREPLICA action will *never* finish. > > I have eventually enabled enough logging to see that it is spinning on > this: > > > INFO > (parallelCoreAdminExecutor-19-thread-4-processing-n:myHost:8984_solr > x:my_colleciton_shard105_0_replica_n2695 OFYOHGJY3554330096761208 UNLOAD) [ > ] o.a.s.c.SolrCore Core my_colleciton_shard105_0_replica_n2695 is not yet > closed, waiting 100 ms before checking again. > > We have left this for tens of MINUTES (I see a recent example in our logs > of this spinning for 25 minutes) without it progressing on its own. When we > notice this we have restart the Solr process, which seems to correct the > state for practical purposes and move on. This manual intervention is very > painful. > > The log statement appears to come from the SolrCore class, in the > closeAndWait > < > https://github.com/apache/solr/blob/33b74e65caf46062737bbc6bc3507a39b1049f67/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1536-L1539 > > > method > (called by unload method). It has a while loop checking for `isClosed`. And > isClosed just checks if references are 0. > > So the question is what could cause references to not go to zero for such a > long period of time? Any way to get visibility on what references are > remaining? Is this a known or documented issue anywhere? > > Thanks >