David Causse created LUCENE-6672:
------------------------------------
Summary: CompiledAutomaton can generate a binary automaton that
has more than 12*maxDeterminizedStates
Key: LUCENE-6672
URL: https://issues.apache.org/jira/browse/LUCENE-6672
Project: Lucene - Core
Issue Type: Bug
Components: core/index
Affects Versions: 4.10.3
Reporter: David Causse
The maxDeterminizedStates parameter to Automaton has introduced a way to
prevent massive states explosion during the generation of Automatas. This is a
nice feature to protect applications against DoS attacks. Unfortunately in some
cases like wildcard queries with a lot of wildcards the resulting binary
Automaton can exceed maxDeterminizedStates by a factor of ~12.
If I configure my application with the default maxDeterminizedStates to 10,000
CompiledAutomaton can potentially generate Automatas with more than 120,000
states.
This is because UTF32ToUTF8 ignores maxDeterminizedStates and can generate a
large binary automata that will be passed to the costly
Operations.getCommonSuffixBytesRef.
Current workaround is to set maxDeterminizedStates to expectedMaxStates/13.
I'm not sure what's the best way to fix this issue, UTF32ToUTF8.convert() uses
the Automaton.Builder which is very fast to create states, adding a check after
each state creation is maybe not the best idea.
A partial quick fix could be to check the size of the resulting binary automata
and fail before running the costly Operations.getCommonSuffixBytesRef.
Another fix would be to generalize maxDeterminizedStates to maxStates at the
Automaton.Builder level. The maxStates could be checked before costly
operations (before ArrayUtil.grow in addTransition and in finishState).
Unfortunately this one requires more refactoring (not included in the patch).
I included two patches to illustrate the above two fixes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]