[
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080124#comment-14080124
]
Steve Davids commented on SOLR-5986:
------------------------------------
There doesn't appear to be any Lucene code that is specifically honoring a
thread interrupt, so if Solr/Lucene is busy enumerating terms in a continual
for loop, sending an interrupt won't actually do anything. The Java code needs
to check if the thread has been interrupted, if so, then bail on the current
process.
Blur does this by creating their own "ExitableTerms", "ExitableTermsEnum", etc
where every time the enum next method is called, it will check to see if the
thread has been interrupted, if it is then an exception is thrown which halts
processing of the query.
https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-store/src/main/java/org/apache/blur/index/ExitableReader.java;h=8321dd27d3537ee239f876448e56e8296407700b;hb=61480125dee51c469a4921004f6daf590410bca6
Performing the thread interrupt check within Lucene seems reasonable for things
that may take a long time to complete, enumerating terms is one of them.
> Don't allow runaway queries from harming Solr cluster health or search
> performance
> ----------------------------------------------------------------------------------
>
> Key: SOLR-5986
> URL: https://issues.apache.org/jira/browse/SOLR-5986
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Steve Davids
> Priority: Critical
> Fix For: 4.9
>
>
> The intent of this ticket is to have all distributed search requests stop
> wasting CPU cycles on requests that have already timed out or are so
> complicated that they won't be able to execute. We have come across a case
> where a nasty wildcard query within a proximity clause was causing the
> cluster to enumerate terms for hours even though the query timeout was set to
> minutes. This caused a noticeable slowdown within the system which made us
> restart the replicas that happened to service that one request, the worst
> case scenario are users with a relatively low zk timeout value will have
> nodes start dropping from the cluster due to long GC pauses.
> [~amccurry] Built a mechanism into Apache Blur to help with the issue in
> BLUR-142 (see commit comment for code, though look at the latest code on the
> trunk for newer bug fixes).
> Solr should be able to either prevent these problematic queries from running
> by some heuristic (possibly estimated size of heap usage) or be able to
> execute a thread interrupt on all query threads once the time threshold is
> met. This issue mirrors what others have discussed on the mailing list:
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%[email protected]%3E
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]