[ 
https://issues.apache.org/jira/browse/SOLR-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997269#comment-15997269
 ] 

Christine Poerschke commented on SOLR-10506:
--------------------------------------------

bq. ... removing a submitted Zookeeper watcher might be pretty hard ..

Yes, you're right. Somehow I thought it could be done since leader elections 
after all can be canceled (there's the ElectionWatcher class in 
LeaderElector.java) but that works differently.

Okay, so if we can't remove a watcher, can we perhaps re-use the one we've got? 
I've added a commit to my 
[jira/solr-10506-branch_6_5|https://github.com/cpoerschke/lucene-solr/tree/jira/solr-10506-branch_6_5]
 branch to explore that possibility.

bq. ... Do you have any hints towards an existing test super class? ...

TestReload and TestConfigReload look like possibilities. 
StressRamUsageEstimator.testLargeSetOfByteArrays does a before/after memory 
measurement. If our test could use a large managed resource (like in your use 
case) and do a couple of reloads then perhaps a very measureable difference 
could be detected without the fix and a less measureable (but still non-zero) 
difference could be detected with the fix, something along those lines?

> Possible memory leak upon collection reload
> -------------------------------------------
>
>                 Key: SOLR-10506
>                 URL: https://issues.apache.org/jira/browse/SOLR-10506
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Server
>    Affects Versions: 6.5
>            Reporter: Torsten Bøgh Köster
>         Attachments: solr_collection_reload_13_cores.png, 
> solr_gc_path_via_zk_WatchManager.png
>
>
> Upon manual Solr Collection reloading, references to the closed {{SolrCore}} 
> are not fully removed by the garbage collector as a strong reference to the 
> {{ZkIndexSchemaReader}} is held in a ZooKeeper {{Watcher}} that watches for 
> schema changes.
> In our case, this leads to a massive memory leak as managed resources are 
> still referenced by the closed {{SolrCore}}. Our Solr cloud environment 
> utilizes rather large managed resources (synonyms, stopwords). To reproduce, 
> we fired out environment up and reloaded the collection 13 times. As a result 
> we fully exhausted our heap. A closer look with the Yourkit profiler revealed 
> 13 {{SolrCore}} instances, still holding strong references to the garbage 
> collection root (see screenshot 1).
> Each {{SolrCore}} instance holds a single path with strong references to the 
> gc root via a `Watcher` in `ZkIndexSchemaReader` (see screenshot 2). The 
> {{ZkIndexSchemaReader}} registers a close hook in the {{SolrCore}} but the 
> Zookeeper is not removed upon core close.
> We supplied a Github Pull Request 
> (https://github.com/apache/lucene-solr/pull/197) that extracts the zookeeper 
> `Watcher` as a static inner class. To eliminate the memory leak, the schema 
> reader is held inside a `WeakReference` and the reference is explicitly 
> removed on core close.
> Initially I wanted to supply a test case but unfortunately did not find a 
> good starting point ...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to