[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kevin Lawson updated LUCENE-4947:
---------------------------------
Attachment: MDAG-master.zip
LevenshteinAutomaton-master.zip
LevenshteinAutomaton-master.zip MD5 checksum: 081b417edbd7d2a562085e1c0dfb0a4c
MDAG-master.zip MD5 checksum: 109e99dca700e02d1ad54306688472a5
> Java implementation (and improvement) of Levenshtein & associated lexicon
> automata
> ----------------------------------------------------------------------------------
>
> Key: LUCENE-4947
> URL: https://issues.apache.org/jira/browse/LUCENE-4947
> Project: Lucene - Core
> Issue Type: Improvement
> Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
> Reporter: Kevin Lawson
> Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip
>
>
> I was encouraged by Mike McCandless to open an issue concerning this after I
> contacted him privately about it. Thanks Mike!
> I'd like to submit my Java implementation of the Levenshtein Automaton as a
> homogenous replacement for the current heterogenous, multi-component
> implementation in Lucene.
> Benefits of upgrading include
> - Reduced code complexity
> - Better performance from components that were previously implemented in
> Python
> - Support for on-the-fly dictionary-automaton manipulation (if you wish to
> use my dictionary-automaton implementation)
> The code for all the components is well structured, easy to follow, and
> extensively commented. It has also been fully tested for correct
> functionality and performance.
> The levenshtein automaton implementation (along with the required MDAG
> reference) can be found in my LevenshteinAutomaton Java library here:
> https://github.com/klawson88/LevenshteinAutomaton.
> The minimalistic directed acyclic graph (MDAG) which the automaton code uses
> to store and step through word sets can be found here:
> https://github.com/klawson88/MDAG
> *Transpositions aren't currently implemented. I hope the comment filled,
> editing-friendly code combined with the fact that the section in the Mihov
> paper detailing transpositions is only 2 pages makes adding the functionality
> trivial.
> *As a result of support for on-the-fly manipulation, the MDAG
> (dictionary-automaton) creation process incurs a slight speed penalty. In
> order to have the best of both worlds, i'd recommend the addition of a
> constructor which only takes sorted input. The complete, easy to follow
> pseudo-code for the simple procedure can be found in the first article I
> linked under the references section in the MDAG repository)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]