[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Lawson updated LUCENE-4947: --------------------------------- Attachment: MDAG-master.zip LevenshteinAutomaton-master.zip LevenshteinAutomaton-master.zip MD5 checksum: 081b417edbd7d2a562085e1c0dfb0a4c MDAG-master.zip MD5 checksum: 109e99dca700e02d1ad54306688472a5 > Java implementation (and improvement) of Levenshtein & associated lexicon > automata > ---------------------------------------------------------------------------------- > > Key: LUCENE-4947 > URL: https://issues.apache.org/jira/browse/LUCENE-4947 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 > Reporter: Kevin Lawson > Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip > > > I was encouraged by Mike McCandless to open an issue concerning this after I > contacted him privately about it. Thanks Mike! > I'd like to submit my Java implementation of the Levenshtein Automaton as a > homogenous replacement for the current heterogenous, multi-component > implementation in Lucene. > Benefits of upgrading include > - Reduced code complexity > - Better performance from components that were previously implemented in > Python > - Support for on-the-fly dictionary-automaton manipulation (if you wish to > use my dictionary-automaton implementation) > The code for all the components is well structured, easy to follow, and > extensively commented. It has also been fully tested for correct > functionality and performance. > The levenshtein automaton implementation (along with the required MDAG > reference) can be found in my LevenshteinAutomaton Java library here: > https://github.com/klawson88/LevenshteinAutomaton. > The minimalistic directed acyclic graph (MDAG) which the automaton code uses > to store and step through word sets can be found here: > https://github.com/klawson88/MDAG > *Transpositions aren't currently implemented. I hope the comment filled, > editing-friendly code combined with the fact that the section in the Mihov > paper detailing transpositions is only 2 pages makes adding the functionality > trivial. > *As a result of support for on-the-fly manipulation, the MDAG > (dictionary-automaton) creation process incurs a slight speed penalty. In > order to have the best of both worlds, i'd recommend the addition of a > constructor which only takes sorted input. The complete, easy to follow > pseudo-code for the simple procedure can be found in the first article I > linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org