[
https://issues.apache.org/jira/browse/SOLR-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499217#comment-14499217
]
Shai Erera commented on SOLR-7408:
----------------------------------
bq. Though I think I understand what you're saying here, can you elaborate more
on this?
If we wanted to change the code such that we put a listener in the map on a
SolrCore creation, and remove it from the map on a SolrCore close, I believe we
wouldn't be running into such concurrency issues. In a sense, this is what is
done when all is *good*: a SolrCore puts a listener in its ctor, and removes it
in its close().
But if something goes *wrong*, we may leave dangling listeners, of SolrCore
instances that no longer exist. This is what I believe
({{CoreAdminHandler.handleCreateAction}} attempts to do -- if a core creation
failed, it attempts to unregister all listeners of a configDir from the map,
and lets {{unregister}} decide if the entry itself can be removed or not. This
ensures that we won't be left w/ dangling listeners that will never be released
- what I referred to as leaking listeners.
The code in {{unregister}} relies on the same logic that introduces the bug --
if there is core in SolrCores which references this configDir, remove all
listeners. The problem is that a core registers a listener, before it is put in
SolrCores, and hence the race condition.
I would personally prefer that we stop removing all listeners, and let a core
take care of itself, but I don't know how safe is Solr code in that regard.
I.e. are all places that create a SolrCore clean up after it in the event of a
failure? Clearly {{CoreAdminHandler.handleCreateAction}} doesn't, which got me
thinking what other places don't do that as well.
But, if we want to change the logic like that, we can certainly look at all the
places that do {{new SolrCore(...)}} and make sure they call
{{SolrCore.close()}} in the event of any failure.
> Race condition can cause a config directory listener to be removed
> ------------------------------------------------------------------
>
> Key: SOLR-7408
> URL: https://issues.apache.org/jira/browse/SOLR-7408
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Shai Erera
> Assignee: Shai Erera
> Attachments: SOLR-7408.patch, SOLR-7408.patch
>
>
> This has been reported here: http://markmail.org/message/ynkm2axkdprppgef,
> and I was able to reproduce it in a test, although I am only able to
> reproduce if I put break points and manually simulate the problematic context
> switches.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]