[jira] [Commented] (SOLR-13709) Race condition on core reload while core is still loading?

Erick Erickson (Jira) Sun, 01 Sep 2019 18:37:14 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920556#comment-16920556
 ]


Erick Erickson commented on SOLR-13709:
---------------------------------------

Blocking getCoreDescriptor until CoreContainer.load() is finished isn't going 
to work. I have 4 test failures, and it starts getting untenable to say "if 
it's the .system collection, you don't have to wait". I did  that and another 
one popped out. Too fragile to consider a permanent fix even if it can be made 
to work.

Hmmm, if it's really just a reload question, I _think_ that a much safer 
alternative would be to have the _reload_ operation wait until core loading was 
complete. I'll give that a try with some debugging code in place just to prove 
the hypothesis.

Thinking more, it seems like swap, unload, and create should all block until 
the coreContainer has completed loading as well. Actually, it seems like _all_ 
core API commands should wait until after CoreContainer.load() is done.

[~hossman] So I put some code in to wait on CoreContainer.load to complete 
before any core admin operation is allowed. WDYT about failing hard whenever 
this occurs as a _temporary_ way to see if this is actually happening? I'm 
thinking on master only but could easily be persuaded to put it on 8x as well. 
If we see test failures that contain "EOE" we'll have hit this condition and 
I'll have more confidence that this is a reasonable fix.

running precommit and a full test suite tonight.

I'd make this JIRA a blocker while the temporary code was in place.....

> Race condition on core reload while core is still loading?
> ----------------------------------------------------------
>
>                 Key: SOLR-13709
>                 URL: https://issues.apache.org/jira/browse/SOLR-13709
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: apache_Lucene-Solr-Tests-8.x_449.log.txt
>
>
> A recent jenkins failure from {{TestSolrCLIRunExample}} seems to suggest that 
> there may be a race condition when attempting to re-load a SolrCore while the 
> core is currently in the process of (re)loading that can leave the SolrCore 
> in an unusable state.
> Details to follow...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13709) Race condition on core reload while core is still loading?

Reply via email to