[
https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571040#action_12571040
]
Grant Ingersoll commented on LUCENE-1183:
-----------------------------------------
It occurs to me that we apparently have two different implementations of
Levenshtein, one in spellchecker and one for FuzzyQuery. I haven't analyzed
them individually to know for sure, but if this is a much better
implementation, then we should think about using it for FuzzyQuery, too.
The FuzzyQuery (FuzzyTermEnum) version claims to have a fast-fail mechanism,
too:
{quote}
<p>Embedded within this algorithm is a fail-fast Levenshtein distance
* algorithm. The fail-fast algorithm differs from the standard Levenshtein
* distance algorithm in that it is aborted if it is discovered that the
* mimimum distance between the words is greater than some threshold.
*
* <p>
{quote}
Cedrik, since you seem to know about these things, would you have time to look
at FuzzyTermEnum? A 3x speedup there would be great for users, too.
> TRStringDistance uses way too much memory (with patch)
> ------------------------------------------------------
>
> Key: LUCENE-1183
> URL: https://issues.apache.org/jira/browse/LUCENE-1183
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/*
> Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3
> Reporter: Cédrik LIME
> Attachments: TRStringDistance.java, TRStringDistance.patch
>
> Original Estimate: 0.17h
> Remaining Estimate: 0.17h
>
> The implementation of TRStringDistance is based on version 2.1 of
> org.apache.commons.lang.StringUtils#getLevenshteinDistance(String, String),
> which uses an un-optimized implementation of the Levenshtein Distance
> algorithm (it uses way too much memory). Please see Bug 38911
> (http://issues.apache.org/bugzilla/show_bug.cgi?id=38911) for more
> information.
> The commons-lang implementation has been heavily optimized as of version 2.2
> (3x speed-up). I have reported the new implementation to TRStringDistance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]