Jim Ferenczi created LUCENE-7914:
------------------------------------

             Summary: Add safeguards to RegExp.toAutomaton and Operations
                 Key: LUCENE-7914
                 URL: https://issues.apache.org/jira/browse/LUCENE-7914
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Jim Ferenczi


When creating an automaton from a regexp, some operators can create more states 
than maxDeterminizedStates:

<code>
a{10000}
</code>

The example above <b>creates</b> a single path with 10k states already 
determinized so maxDeterminizedStates is not checked. 
Some operations on automaton like Operations.isFinite or 
Operations.topoSortStates are recursive and the maximum level of recursion 
depends on the longest path in the automaton. So a large automaton like above 
can exceed java's stack.
In most of the cases we are covered by maxDeterminizedStates but there will 
always be adversarial cases where a large automaton is created from a small 
input so I think we should also have safeguards in the recursive methods. 

I've attached a patch that adds a max recursion level to Operations.isFinite 
and Operations.topoSortStates in order to limit stack overflows. The limit is 
set to 1000 so any automaton with a path bigger than 1000 would throw an 
IllegalStateException.
The patch also uses maxDeterminizedStates to limit the number of states that a 
repeat operator can create and throw a TooComplex..Exception when this limit is 
reached.
Finally the patch adds the ability to skip Operations.isFinite on 
AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
finite automatons only.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to