Hmmmm. +1 to limiting this somehow. Whether it's configurable or not I don't really care, I'd be perfectly fine with reducing it to some sane (handleable) limit. I can argue that having a regex like this is useless from a practical standpoint anyway. But a few of these could make search responsive.
Supporting this kind of edge case doesn't seem worth the effort. Clemens was very clear that this is a test case, the implication that it's not the result of a required use-case. Not complaining at all, mind you, it's great to have stuff like this flushed out..... FWIW Erick On Sat, Jun 28, 2014 at 4:13 AM, Jack Krupansky (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046813#comment-14046813 > ] > > Jack Krupansky edited comment on LUCENE-5791 at 6/28/14 11:11 AM: > ------------------------------------------------------------------ > > At least consider clear Javadoc on limitations and performance, such as the > need to keep wildcard patterns "brief". > > Maybe consider a limit of how many wildcards can be used in a single wildcard > query. Possibly configurable. > > Maybe consider a "trim" mode - if too many wildcards appear, simply trim > trailing portions of the pattern to get under the limit. For example, this > test case might get trimmed to abc*mno*xyz*. This would still match all of > the intended matches, albeit also matching some unintended cases. Maybe a > limit of three wildcards would be reasonable. > > Does ? have the same issue, or is it much more linear? Would ???*???*???*??? > be as bad as abc*mno*xyz*pqr* ? > > Do adjacent ** get collapsed to a single * ? > > Fuzzy query has a very strict limit to assure that it is performant - I would > think that these two query types should have the same performance goals. > > > > was (Author: jkrupan): > At least consider clear Javadoc on limitations and performance, such as the > need to keep wildcard patterns "brief". > > Maybe consider a limit of how many wildcards can be used in a single wildcard > query. Possibly configurable. > > Maybe consider a "trim" mode - if too many wildcards appear, simply trim > trailing portions of the pattern to get under the limit. For example, this > test case might get trimmed to abc*mno*xyz*. This would still match all of > the intended matches, albeit also matching some unintended cases. Maybe a > limit of three wildcards would be reasonable. > > Does ? have the same issue, or is it much more linear? Would ???*???*???*??? > be as bad as abc*mno*xyz*pqr* ? > > Do adjacent ** get collapsed to a single * ? > > >> QueryParserUtil, big query with wildcards -> runs endlessly and produces >> heavy load >> ----------------------------------------------------------------------------------- >> >> Key: LUCENE-5791 >> URL: https://issues.apache.org/jira/browse/LUCENE-5791 >> Project: Lucene - Core >> Issue Type: Bug >> Components: modules/queryparser >> Environment: Lucene 4.7.2 >> Java 6 >> Reporter: Clemens Wyss >> Attachments: afterdet.png >> >> >> The following "testcase" runs endlessly and produces VERY heavy load. >> ... >> String query = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed >> diam nonumy eirmod tempor invidunt ut " >> + "labore et dolore magna aliquyam erat, sed >> diam voluptua. At vero eos et accusam et justo duo dolores et " >> + "ea rebum. Stet clita kasd gubergren, no sea >> takimata sanctus est Lorem ipsum dolor sit amet. " >> + "Lorem ipsum dolor sit amet, consetetur >> sadipscing elitr, sed diam nonumy eirmod tempor invidunt " >> + "ut labore et dolore magna aliquyam erat, >> sed diam voluptua. At vero eos et accusam et justo duo dolores " >> + "et ea rebum. Stet clita kasd gubergren, no >> sea takimata sanctus est Lorem ipsum dolor sit amet"; String query = >> query.replaceAll( "\\s+", "*" ); try { QueryParserUtil.parse( query, new >> String[] { "test" }, new Occur[] { Occur.MUST }, new KeywordAnalyzer() ); } >> catch ( Exception e ) { Assert.fail( e.getMessage() ); } ... >> I don't say this testcase makes "sense", nevertheless the question remains >> whether this is a bug or a "feature"? >> 99% the threaddump/stacktrace looks as follows: >> BasicOperations.determinize(Automaton) line: 680 >> Automaton.determinize() line: 759 >> SpecialOperations.getCommonSuffixBytesRef(Automaton) line: 165 >> CompiledAutomaton.<init>(Automaton, Boolean, boolean) line: 168 >> CompiledAutomaton.<init>(Automaton) line: 91 >> WildcardQuery(AutomatonQuery).<init>(Term, Automaton) line: 67 >> WildcardQuery.<init>(Term) line: 57 >> WildcardQueryNodeBuilder.build(QueryNode) line: 42 >> WildcardQueryNodeBuilder.build(QueryNode) line: 32 >> StandardQueryTreeBuilder(QueryTreeBuilder).processNode(QueryNode, >> QueryBuilder) line: 186 >> StandardQueryTreeBuilder(QueryTreeBuilder).process(QueryNode) line: 125 >> StandardQueryTreeBuilder(QueryTreeBuilder).build(QueryNode) line: 218 >> StandardQueryTreeBuilder.build(QueryNode) line: 82 >> StandardQueryTreeBuilder.build(QueryNode) line: 53 >> StandardQueryParser(QueryParserHelper).parse(String, String) line: 258 >> StandardQueryParser.parse(String, String) line: 168 >> QueryParserUtil.parse(String, String[], BooleanClause$Occur[], Analyzer) >> line: 119 >> IndexingTest.queryParserUtilLimit() line: 1450 > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
