Christian Ziech created LUCENE-8219:
---------------------------------------

             Summary: LevenshteinAutomata should estimate the number of states 
and transitions
                 Key: LUCENE-8219
                 URL: https://issues.apache.org/jira/browse/LUCENE-8219
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Christian Ziech


Currently the toAutomaton() method of the LevenshteinAutomata class uses the 
default constructor of the Automaton although it exactly knows how many states 
the automaton will have and can do a reasonable estimation on how many 
transitions it will need as well.

I suggest changing the lines {code:language=java|firstline=154|linenumbers=true}
// the number of states is based on the length of the word and n
int numStates = description.size();

Automaton a = new Automaton();
int lastState;
{code}
to
{code:language=java|firstline=154|linenumbers=true}
// the number of states is based on the length of the word and n
final int numStates = description.size();
final int numTransitions = numStates * Math.min(1 + 2 * n, alphabet.length);
final int prefixStates = prefix != null ? prefix.codePointCount(0, 
prefix.length()) : 0;

final Automaton a = new Automaton(numStates + prefixStates, numTransitions);
int lastState;
{code}
For my test data this cut down on the total amount of memory needed for int 
arrays by factor 4. The estimation of "1 + 2 * editDistance" should maybe 
rather be replaced by a value coming from the ParametricDescription used.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to