Manybubbles has uploaded a new change for review. https://gerrit.wikimedia.org/r/170973
Change subject: Limit complexity explosion during determinization ...................................................................... Limit complexity explosion during determinization Background: When a regex is compiled to a DFA its actually compiled to an NFA and then determinized. In the worst case the process of determinization produces a DFA that is exponentially (like 2^n not n^2) bigger than the NFA. And the NFA is proportional in size to the complexity of the regex. The fix: Create a new parameter call maxDeterminizedStates and send it to the regex compiler as the maximum number of states to allow the determinized DFA have. If the DFA has more than that many states we throw and error back to the user. The gory details: This isn't supported in the version of Lucene that we use so we had to backport it from trunk. It'll be in 4.10.3 and 5.0. All the XClasses are backports from Lucene. Change-Id: Ieba3c611405f654e8620bf49b03257aea5df1f9d --- A src/main/java/org/apache/lucene/util/XBytesRefBuilder.java A src/main/java/org/apache/lucene/util/XIntsRefBuilder.java A src/main/java/org/apache/lucene/util/XUnicodeUtil.java A src/main/java/org/apache/lucene/util/automaton/XAutomata.java A src/main/java/org/apache/lucene/util/automaton/XAutomaton.java A src/main/java/org/apache/lucene/util/automaton/XAutomatonProvider.java A src/main/java/org/apache/lucene/util/automaton/XByteRunAutomaton.java A src/main/java/org/apache/lucene/util/automaton/XCharacterRunAutomaton.java A src/main/java/org/apache/lucene/util/automaton/XCompiledAutomaton.java A src/main/java/org/apache/lucene/util/automaton/XDaciukMihovAutomatonBuilder.java A src/main/java/org/apache/lucene/util/automaton/XMinimizationOperations.java A src/main/java/org/apache/lucene/util/automaton/XOperations.java A src/main/java/org/apache/lucene/util/automaton/XRegExp.java A src/main/java/org/apache/lucene/util/automaton/XRunAutomaton.java A src/main/java/org/apache/lucene/util/automaton/XSortedIntSet.java A src/main/java/org/apache/lucene/util/automaton/XStatePair.java A src/main/java/org/apache/lucene/util/automaton/XTooComplexToDeterminizeException.java A src/main/java/org/apache/lucene/util/automaton/XTransition.java A src/main/java/org/apache/lucene/util/automaton/XUTF32ToUTF8.java A src/main/java/org/apache/lucene/util/fst/XUtil.java M src/main/java/org/wikimedia/search/extra/regex/SourceRegexFilter.java M src/main/java/org/wikimedia/search/extra/regex/ngram/NGramAutomaton.java M src/main/java/org/wikimedia/search/extra/regex/ngram/NGramExtractor.java M src/main/java/org/wikimedia/search/extra/regex/ngram/package-info.java M src/main/java/org/wikimedia/search/extra/util/package-info.java A src/test/java/org/apache/lucene/util/automaton/AutomatonTestUtil.java A src/test/java/org/apache/lucene/util/automaton/TestAutomaton.java A src/test/java/org/apache/lucene/util/automaton/TestCompiledAutomaton.java A src/test/java/org/apache/lucene/util/automaton/TestDeterminism.java A src/test/java/org/apache/lucene/util/automaton/TestDeterminizeLexicon.java A src/test/java/org/apache/lucene/util/automaton/TestMinimize.java A src/test/java/org/apache/lucene/util/automaton/TestOperations.java A src/test/java/org/apache/lucene/util/automaton/TestRegExp.java M src/test/java/org/wikimedia/search/extra/regex/SourceRegexFilterTest.java M src/test/java/org/wikimedia/search/extra/regex/ngram/NGramAutomatonTest.java M src/test/java/org/wikimedia/search/extra/regex/ngram/NGramExtractorTest.java 36 files changed, 10,195 insertions(+), 70 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/search/extra refs/changes/73/170973/1 -- To view, visit https://gerrit.wikimedia.org/r/170973 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ieba3c611405f654e8620bf49b03257aea5df1f9d Gerrit-PatchSet: 1 Gerrit-Project: search/extra Gerrit-Branch: master Gerrit-Owner: Manybubbles <never...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits