David Causse created LUCENE-6672:
------------------------------------

             Summary: CompiledAutomaton can generate a binary automaton that 
has more than 12*maxDeterminizedStates
                 Key: LUCENE-6672
                 URL: https://issues.apache.org/jira/browse/LUCENE-6672
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 4.10.3
            Reporter: David Causse


The maxDeterminizedStates parameter to Automaton has introduced a way to 
prevent massive states explosion during the generation of Automatas. This is a 
nice feature to protect applications against DoS attacks. Unfortunately in some 
cases like wildcard queries with a lot of wildcards the resulting binary 
Automaton can exceed maxDeterminizedStates by a factor of ~12.
If I configure my application with the default maxDeterminizedStates to 10,000 
CompiledAutomaton can potentially generate Automatas with more than 120,000 
states.

This is because UTF32ToUTF8 ignores maxDeterminizedStates and can generate a 
large binary automata that will be passed to the costly 
Operations.getCommonSuffixBytesRef.

Current workaround is to set maxDeterminizedStates to expectedMaxStates/13.

I'm not sure what's the best way to fix this issue, UTF32ToUTF8.convert() uses 
the Automaton.Builder which is very fast to create states, adding a check after 
each state creation is maybe not the best idea.

A partial quick fix could be to check the size of the resulting binary automata 
and fail before running the costly Operations.getCommonSuffixBytesRef.

Another fix would be to generalize maxDeterminizedStates to maxStates at the 
Automaton.Builder level. The maxStates could be checked before costly 
operations (before ArrayUtil.grow in addTransition and in finishState). 
Unfortunately this one requires more refactoring (not included in the patch).

I included two patches to illustrate the above two fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to