Mike Drob created SOLR-13190:
--------------------------------

             Summary: Fuzzy search treated as server error instead of client 
error when terms are too complex
                 Key: SOLR-13190
                 URL: https://issues.apache.org/jira/browse/SOLR-13190
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: search
    Affects Versions: master (9.0)
            Reporter: Mike Drob
            Assignee: Mike Drob


We've seen a fuzzy search end up breaking the automaton and getting reported as 
a server error. This usage should be improved by
1) reporting as a client error, because it's similar to something like too many 
boolean clauses queries in how an operator should deal with it
2) report what field is causing the error, since that currently must be deduced 
from adjacent query logs and can be difficult if there are multiple terms in 
the search

This trigger was added to defend against adversarial regex but somehow hits 
fuzzy terms as well, I don't understand enough about the automaton mechanisms 
to really know how to approach a fix there, but improving the operability is a 
good first step.

relevant stack trace:

{noformat}
org.apache.lucene.util.automaton.TooComplexToDeterminizeException: 
Determinizing automaton with 13632 states and 21348 transitions would result in 
more than 10000 states.
        at 
org.apache.lucene.util.automaton.Operations.determinize(Operations.java:746)
        at 
org.apache.lucene.util.automaton.RunAutomaton.<init>(RunAutomaton.java:69)
        at 
org.apache.lucene.util.automaton.ByteRunAutomaton.<init>(ByteRunAutomaton.java:32)
        at 
org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:247)
        at 
org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:133)
        at 
org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:143)
        at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
        at 
org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
        at 
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
        at 
org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
        at 
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
        at 
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:667)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:442)
        at 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)
        at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1604)
        at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420)
        at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567)
        at 
org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1435)
        at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:374)
        at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to