optimize lev automata construction
----------------------------------
Key: LUCENE-3094
URL: https://issues.apache.org/jira/browse/LUCENE-3094
Project: Lucene - Java
Issue Type: Improvement
Reporter: Robert Muir
Fix For: 4.0
in our lev automata algorithm, we compute an upperbound of the maximum possible
states (not the true number), and
create some "useless" unconnected states "floating around".
this isn't harmful, in the original impl we did the Automaton is simply a
pointer to the initial state, and all algorithms
traverse this list, so effectively the useless states were dropped immediately.
But recently we changed automaton to
cache its numberedStates, and we set them here, so these useless states are
being kept around.
it has no impact on performance, but can be really confusing if you are
debugging (e.g. toString). Thanks to Dawid Weiss
for noticing this.
at the same time, forcing an extra traversal is a bit scary, so i did some
benchmarking with really long strings and found
that actually its helpful to reduce() the number of transitions (typically cuts
them in half) for these long strings, as it
speeds up some later algorithms.
won't see any speedup for short terms, but I think its easier to work with
these simpler automata anyway, and it eliminates
the confusion of seeing the redundant states without slowing anything down.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]