[ 
https://issues.apache.org/jira/browse/SOLR-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513882#comment-15513882
 ] 

Mike Drob commented on SOLR-9330:
---------------------------------

Ah, I see where the difference is, yes. In my case, the client process getting 
the statistics is an external monitoring application that gets them every 15 
seconds and charts them. Since number of replicas can move, grow and shrink to 
accommodate usage, solving races like this is a very complicated problem. And 
at the end of the day, I don't care if my monitoring system misses one round of 
statistics, I'm more concerned about scary exceptions in the log that the ops 
team has to deal with.

> Race condition between core reload and statistics request
> ---------------------------------------------------------
>
>                 Key: SOLR-9330
>                 URL: https://issues.apache.org/jira/browse/SOLR-9330
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.5
>            Reporter: Andrey Kudryavtsev
>         Attachments: SOLR-9330.patch, SOLR-9390.patch, SOLR-9390.patch, 
> SOLR-9390.patch, SOLR-9390.patch, too_sync.patch
>
>
> Things happened that we execute this two requests consecutively in Solr 5.5:
> * Core reload: /admin/cores?action=RELOAD&core=_coreName_
> * Check core statistics: /_coreName_/admin/mbeans?stats=true
> And sometimes second request ends with this error:
> {code}
> ERROR org.apache.solr.servlet.HttpSolrCall - 
> null:org.apache.lucene.store.AlreadyClosedException: this IndexReader is 
> closed
>       at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:274)
>       at 
> org.apache.lucene.index.StandardDirectoryReader.getVersion(StandardDirectoryReader.java:331)
>       at 
> org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:119)
>       at 
> org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:119)
>       at 
> org.apache.solr.search.SolrIndexSearcher.getStatistics(SolrIndexSearcher.java:2404)
>       at 
> org.apache.solr.handler.admin.SolrInfoMBeanHandler.addMBean(SolrInfoMBeanHandler.java:164)
>       at 
> org.apache.solr.handler.admin.SolrInfoMBeanHandler.getMBeanInfo(SolrInfoMBeanHandler.java:134)
>       at 
> org.apache.solr.handler.admin.SolrInfoMBeanHandler.handleRequestBody(SolrInfoMBeanHandler.java:65)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
>       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670)
>       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)
> {code}
> If my understanding of SolrCore internals is correct, it happens because of 
> async nature of reload request:
> * New searcher is "registered" in separate thread
> * Old searcher is closed in same separate thread and only after new one is 
> registered
> * When old searcher is closing, it removes itself from map with MBeans 
> * If statistic requests happens before old searcher is completely removed 
> from everywhere - exception can happen. 
> What do you think if we will introduce new parameter for reload request which 
> makes it fully synchronized? Basically it will force it to call {code}  
> SolrCore#getSearcher(boolean forceNew, boolean returnSearcher, final Future[] 
> waitSearcher, boolean updateHandlerReopens) {code} with waitSearcher!= null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to