Manybubbles has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/170973

Change subject: Limit complexity explosion during determinization
......................................................................

Limit complexity explosion during determinization

Background:
When a regex is compiled to a DFA its actually compiled to an NFA and then
determinized.  In the worst case the process of determinization produces a
DFA that is exponentially (like 2^n not n^2) bigger than the NFA.  And the
NFA is proportional in size to the complexity of the regex.

The fix:
Create a new parameter call maxDeterminizedStates and send it to the regex
compiler as the maximum number of states to allow the determinized DFA have.
If the DFA has more than that many states we throw and error back to the
user.

The gory details:
This isn't supported in the version of Lucene that we use so we had to
backport it from trunk.  It'll be in 4.10.3 and 5.0.  All the XClasses are
backports from Lucene.

Change-Id: Ieba3c611405f654e8620bf49b03257aea5df1f9d
---
A src/main/java/org/apache/lucene/util/XBytesRefBuilder.java
A src/main/java/org/apache/lucene/util/XIntsRefBuilder.java
A src/main/java/org/apache/lucene/util/XUnicodeUtil.java
A src/main/java/org/apache/lucene/util/automaton/XAutomata.java
A src/main/java/org/apache/lucene/util/automaton/XAutomaton.java
A src/main/java/org/apache/lucene/util/automaton/XAutomatonProvider.java
A src/main/java/org/apache/lucene/util/automaton/XByteRunAutomaton.java
A src/main/java/org/apache/lucene/util/automaton/XCharacterRunAutomaton.java
A src/main/java/org/apache/lucene/util/automaton/XCompiledAutomaton.java
A 
src/main/java/org/apache/lucene/util/automaton/XDaciukMihovAutomatonBuilder.java
A src/main/java/org/apache/lucene/util/automaton/XMinimizationOperations.java
A src/main/java/org/apache/lucene/util/automaton/XOperations.java
A src/main/java/org/apache/lucene/util/automaton/XRegExp.java
A src/main/java/org/apache/lucene/util/automaton/XRunAutomaton.java
A src/main/java/org/apache/lucene/util/automaton/XSortedIntSet.java
A src/main/java/org/apache/lucene/util/automaton/XStatePair.java
A 
src/main/java/org/apache/lucene/util/automaton/XTooComplexToDeterminizeException.java
A src/main/java/org/apache/lucene/util/automaton/XTransition.java
A src/main/java/org/apache/lucene/util/automaton/XUTF32ToUTF8.java
A src/main/java/org/apache/lucene/util/fst/XUtil.java
M src/main/java/org/wikimedia/search/extra/regex/SourceRegexFilter.java
M src/main/java/org/wikimedia/search/extra/regex/ngram/NGramAutomaton.java
M src/main/java/org/wikimedia/search/extra/regex/ngram/NGramExtractor.java
M src/main/java/org/wikimedia/search/extra/regex/ngram/package-info.java
M src/main/java/org/wikimedia/search/extra/util/package-info.java
A src/test/java/org/apache/lucene/util/automaton/AutomatonTestUtil.java
A src/test/java/org/apache/lucene/util/automaton/TestAutomaton.java
A src/test/java/org/apache/lucene/util/automaton/TestCompiledAutomaton.java
A src/test/java/org/apache/lucene/util/automaton/TestDeterminism.java
A src/test/java/org/apache/lucene/util/automaton/TestDeterminizeLexicon.java
A src/test/java/org/apache/lucene/util/automaton/TestMinimize.java
A src/test/java/org/apache/lucene/util/automaton/TestOperations.java
A src/test/java/org/apache/lucene/util/automaton/TestRegExp.java
M src/test/java/org/wikimedia/search/extra/regex/SourceRegexFilterTest.java
M src/test/java/org/wikimedia/search/extra/regex/ngram/NGramAutomatonTest.java
M src/test/java/org/wikimedia/search/extra/regex/ngram/NGramExtractorTest.java
36 files changed, 10,195 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/search/extra 
refs/changes/73/170973/1


-- 
To view, visit https://gerrit.wikimedia.org/r/170973
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieba3c611405f654e8620bf49b03257aea5df1f9d
Gerrit-PatchSet: 1
Gerrit-Project: search/extra
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <never...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to