[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

Gus Heck (JIRA) Sun, 21 Feb 2016 12:05:06 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156203#comment-15156203
 ]


Gus Heck commented on SOLR-8349:
--------------------------------

*WRT #3/derministic behavior*: Here's the use case:

# server is started, it loads a component that loads a file and creates 
resource A version 1 into memory
# some time later the file is updated, and these updates need to be deployed
# the new version 2 of the file is deployed to the server and the core is 
unloaded 
# the core is then loaded again and brought on line and made available to users.

We now cannot predict which version of the resource is available to the users. 
If GC occured and the resource was collected between steps 3 and 4 the new 
resource will become available as the user would expect. If not, the old 
resource will show up on calls to getResource() until a GC occurs in which the 
JVM decides to clear the weak reference to it. If the component caches a (hard) 
reference to the resource, the new version of the resource will never get 
loaded. The previous system without weak references did not allow the old 
resource to ever be unloaded (and hence was deterministic). Now the behavior is 
a product of GC timing and the internal aspects of how the component was 
programmed. I would like to subsequently (in some later patch) make it possible 
to refresh the resource in a predictable manner without restarting the whole 
node.

*WRT hard references*: I want people to have success not missteps and 
re-implementation using my feature :). For this reason I really like the weak 
references suggestion you made, but I want to manage it for them and not burden 
them with handling it properly. The submitted approach was meant to not bite 
the user who writes a component that never holds a reference to the resource. 
This would be a reasonable naive implementation for someone who knows nothing 
about the internals of solr and assumed they shouldn't hold the reference to 
ensure that the same resource was always seen everywhere.

*WRT the abstraction*: it's there to get the loading code added to the 
deferredCallables list.  SolrResourceLoader has no knowledge of the SolrCore 
until the core calls inform(core) on it. Unfortunately inform(resourceLoader) 
gets called before that. So any attempt to cast and do 
((SolrResourceLoader)loader).getCore().getContainer() in the implementation of 
ResourceLoader#inform(loader) will throw an NPE. That's why the 
deferredCallables list exists. I chose to add the abstraction to enable the 
loader/core to manage hard references and allow the processing to become 
uniform with all loads being deferred. I wanted the folks attempting to use 
this to have a clear intuitive path to do so and the interfaces are meant to 
guide them into doing the right thing without needing to know all the details.

It's worth noting that if the goal is a simple patch, the way to eliminate the 
MOST complexity from the patch is to have the component author manage 
references, and change: 
{code}
      resourceLoader.inform(resourceLoader);
      resourceLoader.inform(this); // last call before the latch is released.
{code}
 to
{code}
      resourceLoader.inform(this); 
      resourceLoader.inform(resourceLoader); // last call before the latch is 
released.
{code}

In that case, casting and navigating to the container in inform(ResourceLoader) 
will work and  we can loose the abstractions, the deferred callables and 
associated latch/synchronization, and the object reference code goes away 
too... but I definitely don't feel qualified to change the order in which 
components are made aware of things. I have no idea if any code out there would 
be relying on this order of inform() calls in some way. 

Lastly, Object key's are certainly possible, though this does reintroduce a 
vector for class loader memory leakages as previously discussed. I left this 
out because we were not supporting the lucene analyzers yet, and I wasn't yet 
adding "automatic" keys from configuration nodes. Automatic keys would be a 
nice feature to improve the feature and ensure implementors don't need to think 
so hard to use it. I'm amenable to try adding that now if you like, though the 
option to supply one's own key should remain.


> Allow sharing of large in memory data structures across cores
> -------------------------------------------------------------
>
>                 Key: SOLR-8349
>                 URL: https://issues.apache.org/jira/browse/SOLR-8349
>             Project: Solr
>          Issue Type: Improvement
>          Components: Server
>    Affects Versions: 5.3
>            Reporter: Gus Heck
>         Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

Reply via email to