[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-6046:
--------------------------------
    Attachment: LUCENE-6046.patch

First cut at a patch.  Adds maxDeterminizedStates to Operations.determinize and 
pipes it through to tons of places.  I think its important never to hide when 
determinize is called because of how potentially heavy it is.  Forcing callers 
of MinimizationOperations.minimize, Operations.reverse, Operations.minus etc to 
specify maxDeterminizedStates makes it pretty clear that the automaton might be 
determinized during those processes.

I added an unchecked exception for when the Automaton can't be determinized 
within the specified number of state but I'm really tempted to change it to a 
checked exception to make it super duper obvious when determinization might 
occur.

> RegExp.toAutomaton high memory use
> ----------------------------------
>
>                 Key: LUCENE-6046
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6046
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 4.10.1
>            Reporter: Lee Hinman
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-6046.patch
>
>
> When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
> it's possible for the automaton to use so much memory it exceeds the maximum 
> array size for java.
> The following caused an OutOfMemoryError with a 32gb heap:
> {noformat}
> new 
> RegExp("\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}").toAutomaton();
> {noformat}
> When increased to a 60gb heap, the following exception is thrown:
> {noformat}
>   1> java.lang.IllegalArgumentException: requested array size 2147483624 
> exceeds maximum array in java (2147483623)
>   1>     
> __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
>   1>     org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
>   1>     org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
>   1>     
> org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
>   1>     
> org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
>   1>     
> org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
>   1>     
> org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
>   1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
>   1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to