[
https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson reassigned SOLR-14969:
-------------------------------------
Assignee: Erick Erickson
> Prevent creating multiple cores with the same name which leads to
> instabilities (race condition)
> ------------------------------------------------------------------------------------------------
>
> Key: SOLR-14969
> URL: https://issues.apache.org/jira/browse/SOLR-14969
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: multicore
> Affects Versions: 8.6, 8.6.3
> Reporter: Andreas Hubold
> Assignee: Erick Erickson
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> CoreContainer#create does not correctly handle concurrent requests to create
> the same core. There's a race condition (see also existing TODO comment in
> the code), and CoreContainer#createFromDescriptor may be called subsequently
> for the same core name.
> The _second call_ then fails to create an IndexWriter, and exception handling
> causes an inconsistent CoreContainer state.
> {noformat}
> 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [ ]
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error
> CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core
> [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual
> machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312)
> at
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
> at
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
> ...
> Caused by: org.apache.solr.common.SolrException: Unable to create core
> [blueprint_acgqqafsogyc_comments]
> at
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273)
> ... 47 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1071)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:906)
> at
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387)
> ... 48 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308)
> at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1012)
> ... 50 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by
> this virtual machine:
> /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
> at
> org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139)
> at
> org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
> at
> org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
> at
> org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:785)
> at
> org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:126)
> at
> org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
> at
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261)
> at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135)
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145)
> {noformat}
> CoreContainer#createFromDescriptor removes the CoreDescriptor when handling
> this exception. The SolrCore created for the first successful call is still
> registered in SolrCores.cores, but now there's no corresponding
> CoreDescriptor for that name anymore.
> This inconsistency leads to subsequent NullPointerExceptions, for example
> when using CoreAdmin STATUS with the core name:
> CoreAdminOperation#getCoreStatus first gets the non-null SolrCore
> (cores.getCore(cname)) but core.getInstancePath() throws an NPE, because the
> CoreDescriptor is not registered anymore:
> {noformat}
> 2020-10-27 00:29:25.353 INFO (qtp2029754983-19) [ ] o.a.s.s.HttpSolrCall
> [admin] webapp=null path=/admin/cores
> params={core=blueprint_acgqqafsogyc_comments&action=STATUS&indexInfo=false&wt=javabin&version=2}
> status=500 QTime=0
> 2020-10-27 00:29:25.353 ERROR (qtp2029754983-19) [ ] o.a.s.s.HttpSolrCall
> null:org.apache.solr.common.SolrException: Error handling 'STATUS' action
> at
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:372)
> at
> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
> ...
> Caused by: java.lang.NullPointerException
> at org.apache.solr.core.SolrCore.getInstancePath(SolrCore.java:333)
> at
> org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:329)
> at org.apache.solr.handler.admin.StatusOp.execute(StatusOp.java:54)
> at
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
> {noformat}
> STATUS keeps failing until Solr is restarted.
> The NPE for CoreAdmin STATUS is a regression in 8.6. It seems to be caused by
> https://github.com/apache/lucene-solr/commit/17ae79b0905b2bf8635c1b260b30807cae2f5463#diff-9652fe8353b7eff59cd6f128bb2699d88361e670b840ee5ca1018b1bc45584d1R324
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]