[ https://issues.apache.org/jira/browse/LUCENE-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Morton updated LUCENE-1550: ---------------------------------- Attachment: LUCENE-1550.patch 2 seems a reasonable default. Experiments in paper should comparable results for bi-grams and tri-grams. Made an empty constructor which sets n=2. Yes that can be moved up without penalty. That's a bug in the empty case. Should return 0 unless both strings are empty. I ported this bug form the Levenstein Distance code. It's now fixed in both and has unit tests in both. New patch attached. Technically NGramDistance(1) is the same thing as LevensteinDistance but LevensteinDistance code is more straight forward and may be slightly faster. > Add N-Gram String Matching for Spell Checking > --------------------------------------------- > > Key: LUCENE-1550 > URL: https://issues.apache.org/jira/browse/LUCENE-1550 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/spellchecker > Affects Versions: 2.9 > Reporter: Thomas Morton > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1550.patch, LUCENE-1550.patch > > > N-Gram version of edit distance based on paper by Grzegorz Kondrak, "N-gram > similarity and distance". Proceedings of the Twelfth International Conference > on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126, > Buenos Aires, Argentina, November 2005. > http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org