[ 
https://issues.apache.org/jira/browse/SOLR-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172145#comment-16172145
 ] 

Luiz Armesto commented on SOLR-11297:
-------------------------------------

I've been working on this issue for a few days now. The problem occours when 
you start making requests (e.g. ping) before the cores are completly loaded. In 
this scenario two threads try to load the same core at the same time, one 
executing `CoreContainer#load` as part of Solr init process and another 
executing `CoreContainer#getCore` to respond the HTTP request.

The `getCore` method uses the `waitAddPendingCoreOps` method to wait pending 
core operations before trying to create the core if it isn't loaded. That's ok. 
But the `load` method doesn't put an entry in pending core ops before it tries 
to create the core.

The solution is to surround the call to the method `createFromDescriptor` with 
`waitAddPendingCoreOps` and `removeFromPendingOps`.

I've attached a shell script showing how to reproduce this issue and a draft 
patch adding the missing methods calls. I tried to write an unit test but I 
couldn't figure out how to handle the execution of two or more threads to make 
sure the test is deterministic.

> Message "Lock held by this virtual machine" during startup.  Solr is trying 
> to start some cores twice
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11297
>                 URL: https://issues.apache.org/jira/browse/SOLR-11297
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.6
>            Reporter: Shawn Heisey
>            Assignee: Erick Erickson
>         Attachments: SOLR-11297.patch, SOLR-11297.sh, solr6_6-startup.log
>
>
> Sometimes when Solr is restarted, I get some "lock held by this virtual 
> machine" messages in the log, and the admin UI has messages about a failure 
> to open a new searcher.  It doesn't happen on all cores, and the list of 
> cores that have the problem changes on subsequent restarts.  The cores that 
> exhibit the problems are working just fine -- the first core load is 
> successful, the failure to open a new searcher is on a second core load 
> attempt, which fails.
> None of the cores in the system are sharing an instanceDir or dataDir.  This 
> has been verified several times.
> The index is sharded manually, and the servers are not running in cloud mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to