Lee Hinman created LUCENE-6046: ---------------------------------- Summary: RegExp.toAutomaton high memory use Key: LUCENE-6046 URL: https://issues.apache.org/jira/browse/LUCENE-6046 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.10.1 Reporter: Lee Hinman Priority: Minor
When creating an automaton from an org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to use so much memory it exceeds the maximum array size for java. The following caused an OutOfMemoryError with a 32gb heap: {noformat} new RegExp("\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}").toAutomaton(); {noformat} When increased to a 60gb heap, the following exception is thrown: {noformat} 1> java.lang.IllegalArgumentException: requested array size 2147483624 exceeds maximum array in java (2147483623) 1> __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) 1> org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) 1> org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) 1> org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) 1> org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) 1> org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) 1> org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) 1> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) 1> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org