[
https://issues.apache.org/jira/browse/SOLR-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920556#comment-16920556
]
Erick Erickson commented on SOLR-13709:
---------------------------------------
Blocking getCoreDescriptor until CoreContainer.load() is finished isn't going
to work. I have 4 test failures, and it starts getting untenable to say "if
it's the .system collection, you don't have to wait". I did that and another
one popped out. Too fragile to consider a permanent fix even if it can be made
to work.
Hmmm, if it's really just a reload question, I _think_ that a much safer
alternative would be to have the _reload_ operation wait until core loading was
complete. I'll give that a try with some debugging code in place just to prove
the hypothesis.
Thinking more, it seems like swap, unload, and create should all block until
the coreContainer has completed loading as well. Actually, it seems like _all_
core API commands should wait until after CoreContainer.load() is done.
[~hossman] So I put some code in to wait on CoreContainer.load to complete
before any core admin operation is allowed. WDYT about failing hard whenever
this occurs as a _temporary_ way to see if this is actually happening? I'm
thinking on master only but could easily be persuaded to put it on 8x as well.
If we see test failures that contain "EOE" we'll have hit this condition and
I'll have more confidence that this is a reasonable fix.
running precommit and a full test suite tonight.
I'd make this JIRA a blocker while the temporary code was in place.....
> Race condition on core reload while core is still loading?
> ----------------------------------------------------------
>
> Key: SOLR-13709
> URL: https://issues.apache.org/jira/browse/SOLR-13709
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Assignee: Erick Erickson
> Priority: Major
> Attachments: apache_Lucene-Solr-Tests-8.x_449.log.txt
>
>
> A recent jenkins failure from {{TestSolrCLIRunExample}} seems to suggest that
> there may be a race condition when attempting to re-load a SolrCore while the
> core is currently in the process of (re)loading that can leave the SolrCore
> in an unusable state.
> Details to follow...
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]