[ https://issues.apache.org/jira/browse/SOLR-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761956#comment-16761956 ]
Mike Drob commented on SOLR-13190: ---------------------------------- Can you expand on that a little more? Tracing through the code path we... 1) Build a LevenshteinAutomaton 2) For each edit distance (0,1,2) create a separate instance of Automaton class 3) Each Automaton is converted UTF32toUTF8 4) Each UTF8 Automaton is wrapped in a ByteRunAutomaton, which attempts to determinize. The objects produced by 2) are deterministic, the ones produced by 3) are not. So, a few questions and a maybe a theory: Does converting an already deterministic automaton necessarily destroy the deterministic state? If so, are we sure that we need to be doing the conversion in the first place? The comments claim that PrefixQuery doesn't need the conversion, so maybe we can get away without it here too? LUCENE-6367 makes me think that FuzzyQuery should subclass AutomatonQuery as well? > Fuzzy search treated as server error instead of client error when terms are > too complex > --------------------------------------------------------------------------------------- > > Key: SOLR-13190 > URL: https://issues.apache.org/jira/browse/SOLR-13190 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search > Affects Versions: master (9.0) > Reporter: Mike Drob > Assignee: Mike Drob > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We've seen a fuzzy search end up breaking the automaton and getting reported > as a server error. This usage should be improved by > 1) reporting as a client error, because it's similar to something like too > many boolean clauses queries in how an operator should deal with it > 2) report what field is causing the error, since that currently must be > deduced from adjacent query logs and can be difficult if there are multiple > terms in the search > This trigger was added to defend against adversarial regex but somehow hits > fuzzy terms as well, I don't understand enough about the automaton mechanisms > to really know how to approach a fix there, but improving the operability is > a good first step. > relevant stack trace: > {noformat} > org.apache.lucene.util.automaton.TooComplexToDeterminizeException: > Determinizing automaton with 13632 states and 21348 transitions would result > in more than 10000 states. > at > org.apache.lucene.util.automaton.Operations.determinize(Operations.java:746) > at > org.apache.lucene.util.automaton.RunAutomaton.<init>(RunAutomaton.java:69) > at > org.apache.lucene.util.automaton.ByteRunAutomaton.<init>(ByteRunAutomaton.java:32) > at > org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:247) > at > org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:133) > at > org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:143) > at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154) > at > org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78) > at > org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58) > at > org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67) > at > org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310) > at > org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:667) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:442) > at > org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1604) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567) > at > org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1435) > at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:374) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org