FWIW, I integrated sourceforge's "SecondString" algos (http://secondstring.sourceforge.net/javadoc ) and others using a callout interface which boiled down to:
float getDifference(String a, String b) This seemed to be the cleanest lowest-common-denominator standard for plugging in string comparison algos where each implementation produces a score between 0 and 1. I didn't find anything in the algorithm alternatives which was particularly faster or better than the existing edit distance impl but I don't think it hurts to enable pluggability. On a related topic - if effort is being spent refactoring FuzzyQuery I would re-raise the more pressing concern that the default IDF ranking in here is broken because it favours rare terms (ie misspellings). See contrib/queries/....FuzzyLikeThisQuery for alternative scoring strategies or the thread here: http://www.mail-archive.com/java-user@lucene.apache.org/msg05811.html Cheers Mark ----- Original Message ---- From: eks dev <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, 8 June, 2006 3:12:04 PM Subject: Re: Edit-distance strategy what could show measurably faster results is more in Jaro distance direction, but even than, the fact that you need to scan all terms in dictionary will make it prohibitive for large colletions. For small collections this can be packed in some smart data structures to restrict number of distance calculations... good luck with it ----- Original Message ---- From: eks dev <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, 8 June, 2006 4:07:07 PM Subject: Re: Edit-distance strategy >I'm about to replace the edit-distance algorithm in FuzzyQuery from >Levenstein to Hirschberg to save a couple of clockticks. Have you allready confirmed Hirschberg algorithm to be faster than current implementation of edit distance? I am not convinced it helps really. Hirschberg and standard DP algorithm both have O(|s1|*|s2|) time. The only saving is that Hirschberg needs O(|s2|) space using binary recursion (classic needs O(|s1|*|s2|) space). --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]