[ https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156203#comment-15156203 ]
Gus Heck commented on SOLR-8349: -------------------------------- *WRT #3/derministic behavior*: Here's the use case: # server is started, it loads a component that loads a file and creates resource A version 1 into memory # some time later the file is updated, and these updates need to be deployed # the new version 2 of the file is deployed to the server and the core is unloaded # the core is then loaded again and brought on line and made available to users. We now cannot predict which version of the resource is available to the users. If GC occured and the resource was collected between steps 3 and 4 the new resource will become available as the user would expect. If not, the old resource will show up on calls to getResource() until a GC occurs in which the JVM decides to clear the weak reference to it. If the component caches a (hard) reference to the resource, the new version of the resource will never get loaded. The previous system without weak references did not allow the old resource to ever be unloaded (and hence was deterministic). Now the behavior is a product of GC timing and the internal aspects of how the component was programmed. I would like to subsequently (in some later patch) make it possible to refresh the resource in a predictable manner without restarting the whole node. *WRT hard references*: I want people to have success not missteps and re-implementation using my feature :). For this reason I really like the weak references suggestion you made, but I want to manage it for them and not burden them with handling it properly. The submitted approach was meant to not bite the user who writes a component that never holds a reference to the resource. This would be a reasonable naive implementation for someone who knows nothing about the internals of solr and assumed they shouldn't hold the reference to ensure that the same resource was always seen everywhere. *WRT the abstraction*: it's there to get the loading code added to the deferredCallables list. SolrResourceLoader has no knowledge of the SolrCore until the core calls inform(core) on it. Unfortunately inform(resourceLoader) gets called before that. So any attempt to cast and do ((SolrResourceLoader)loader).getCore().getContainer() in the implementation of ResourceLoader#inform(loader) will throw an NPE. That's why the deferredCallables list exists. I chose to add the abstraction to enable the loader/core to manage hard references and allow the processing to become uniform with all loads being deferred. I wanted the folks attempting to use this to have a clear intuitive path to do so and the interfaces are meant to guide them into doing the right thing without needing to know all the details. It's worth noting that if the goal is a simple patch, the way to eliminate the MOST complexity from the patch is to have the component author manage references, and change: {code} resourceLoader.inform(resourceLoader); resourceLoader.inform(this); // last call before the latch is released. {code} to {code} resourceLoader.inform(this); resourceLoader.inform(resourceLoader); // last call before the latch is released. {code} In that case, casting and navigating to the container in inform(ResourceLoader) will work and we can loose the abstractions, the deferred callables and associated latch/synchronization, and the object reference code goes away too... but I definitely don't feel qualified to change the order in which components are made aware of things. I have no idea if any code out there would be relying on this order of inform() calls in some way. Lastly, Object key's are certainly possible, though this does reintroduce a vector for class loader memory leakages as previously discussed. I left this out because we were not supporting the lucene analyzers yet, and I wasn't yet adding "automatic" keys from configuration nodes. Automatic keys would be a nice feature to improve the feature and ensure implementors don't need to think so hard to use it. I'm amenable to try adding that now if you like, though the option to supply one's own key should remain. > Allow sharing of large in memory data structures across cores > ------------------------------------------------------------- > > Key: SOLR-8349 > URL: https://issues.apache.org/jira/browse/SOLR-8349 > Project: Solr > Issue Type: Improvement > Components: Server > Affects Versions: 5.3 > Reporter: Gus Heck > Attachments: SOLR-8349.patch, SOLR-8349.patch > > > In some cases search components or analysis classes may utilize a large > dictionary or other in-memory structure. When multiple cores are loaded with > identical configurations utilizing this large in memory structure, each core > holds it's own copy in memory. This has been noted in the past and a specific > case reported in SOLR-3443. This patch provides a generalized capability, and > if accepted, this capability will then be used to fix SOLR-3443. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org