[
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Ferenczi updated LUCENE-7914:
---------------------------------
Description:
When creating an automaton from a regexp, some operators can create more states
than maxDeterminizedStates:
{code}
a{10000}
{code}
The example above <b>creates</b> a single path with 10k states already
determinized so maxDeterminizedStates is not checked.
Some operations on automaton like Operations.isFinite or
Operations.topoSortStates are recursive and the maximum level of recursion
depends on the longest path in the automaton. So a large automaton like above
can exceed java's stack.
In most of the cases we are covered by maxDeterminizedStates but there will
always be adversarial cases where a large automaton is created from a small
input so I think we should also have safeguards in the recursive methods.
I've attached a patch that adds a max recursion level to Operations.isFinite
and Operations.topoSortStates in order to limit stack overflows. The limit is
set to 1000 so any automaton with a path bigger than 1000 would throw an
IllegalStateException.
The patch also uses maxDeterminizedStates to limit the number of states that a
repeat operator can create and throw a TooComplex..Exception when this limit is
reached.
Finally the patch adds the ability to skip Operations.isFinite on
AutomatonQuery and uses this as an optimization for PrefixQuery that uses
finite automatons only.
was:
When creating an automaton from a regexp, some operators can create more states
than maxDeterminizedStates:
<code>
a{10000}
</code>
The example above <b>creates</b> a single path with 10k states already
determinized so maxDeterminizedStates is not checked.
Some operations on automaton like Operations.isFinite or
Operations.topoSortStates are recursive and the maximum level of recursion
depends on the longest path in the automaton. So a large automaton like above
can exceed java's stack.
In most of the cases we are covered by maxDeterminizedStates but there will
always be adversarial cases where a large automaton is created from a small
input so I think we should also have safeguards in the recursive methods.
I've attached a patch that adds a max recursion level to Operations.isFinite
and Operations.topoSortStates in order to limit stack overflows. The limit is
set to 1000 so any automaton with a path bigger than 1000 would throw an
IllegalStateException.
The patch also uses maxDeterminizedStates to limit the number of states that a
repeat operator can create and throw a TooComplex..Exception when this limit is
reached.
Finally the patch adds the ability to skip Operations.isFinite on
AutomatonQuery and uses this as an optimization for PrefixQuery that uses
finite automatons only.
> Add safeguards to RegExp.toAutomaton and Operations
> ---------------------------------------------------
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Jim Ferenczi
>
> When creating an automaton from a regexp, some operators can create more
> states than maxDeterminizedStates:
> {code}
> a{10000}
> {code}
> The example above <b>creates</b> a single path with 10k states already
> determinized so maxDeterminizedStates is not checked.
> Some operations on automaton like Operations.isFinite or
> Operations.topoSortStates are recursive and the maximum level of recursion
> depends on the longest path in the automaton. So a large automaton like above
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will
> always be adversarial cases where a large automaton is created from a small
> input so I think we should also have safeguards in the recursive methods.
> I've attached a patch that adds a max recursion level to Operations.isFinite
> and Operations.topoSortStates in order to limit stack overflows. The limit is
> set to 1000 so any automaton with a path bigger than 1000 would throw an
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that
> a repeat operator can create and throw a TooComplex..Exception when this
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses
> finite automatons only.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]