Lee Hinman created LUCENE-6046:
----------------------------------

             Summary: RegExp.toAutomaton high memory use
                 Key: LUCENE-6046
                 URL: https://issues.apache.org/jira/browse/LUCENE-6046
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/queryparser
    Affects Versions: 4.10.1
            Reporter: Lee Hinman
            Priority: Minor


When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
it's possible for the automaton to use so much memory it exceeds the maximum 
array size for java.

The following caused an OutOfMemoryError with a 32gb heap:

{noformat}
new 
RegExp("\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}").toAutomaton();
{noformat}

When increased to a 60gb heap, the following exception is thrown:

{noformat}
  1> java.lang.IllegalArgumentException: requested array size 2147483624 
exceeds maximum array in java (2147483623)
  1>     
__randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
  1>     org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
  1>     org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
  1>     
org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
  1>     
org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
  1>     
org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
  1>     
org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
  1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
  1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to