[ 
https://issues.apache.org/jira/browse/LUCENE-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409687#comment-16409687
 ] 

ASF subversion and git services commented on LUCENE-8219:
---------------------------------------------------------

Commit 7cadada441d7e10fc1271cf5b68f76efcdfdbc9b in lucene-solr's branch 
refs/heads/master from [~kwri...@metacarta.com]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7cadada ]

LUCENE-8219: Do a better job of estimating automaton array sizes up front, to 
save on reallocation.  Committed on behalf of Christian Ziech.


> LevenshteinAutomata should estimate the number of states and transitions
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-8219
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8219
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Christian Ziech
>            Assignee: Karl Wright
>            Priority: Major
>
> Currently the toAutomaton() method of the LevenshteinAutomata class uses the 
> default constructor of the Automaton although it exactly knows how many 
> states the automaton will have and can do a reasonable estimation on how many 
> transitions it will need as well.
> I suggest changing the lines 
> {code:language=java|firstline=154|linenumbers=true}
> // the number of states is based on the length of the word and n
> int numStates = description.size();
> Automaton a = new Automaton();
> int lastState;
> {code}
> to
> {code:language=java|firstline=154|linenumbers=true}
> // the number of states is based on the length of the word and n
> final int numStates = description.size();
> final int numTransitions = numStates * Math.min(1 + 2 * n, alphabet.length);
> final int prefixStates = prefix != null ? prefix.codePointCount(0, 
> prefix.length()) : 0;
> final Automaton a = new Automaton(numStates + prefixStates, numTransitions);
> int lastState;
> {code}
> For my test data this cut down on the total amount of memory needed for int 
> arrays by factor 4. The estimation of "1 + 2 * editDistance" should maybe 
> rather be replaced by a value coming from the ParametricDescription used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to