Jim Ferenczi updated LUCENE-7914:
    Fix Version/s:     (was: 7.1)

> Add safeguards to RegExp.toAutomaton and Operations
> ---------------------------------------------------
>                 Key: LUCENE-7914
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7914
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Jim Ferenczi
>             Fix For: 7.0, master (8.0)
>         Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{10000}
> {code}
> The example above <b>creates</b> a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to