Thinking about this a bit more, I'm somewhat sympathetic to the performance arguments when the user is using the minSimilarity type of parameter (i.e. a number less than 1) since it's not obvious what algorithm would be invoked (i.e. what the resulting edit distance requested is), and the user is not requesting a specific edit distance in any case (it's fuzzy ;-)
When edit distance is used directly however (i.e. param >= 1), things are both predictable and easy to document - there are no surprises. So perhaps what makes the most sense is this: - if minSimilarity < 1, then calculate the max edit distance based on a parameter fuzzy.maxDistance or something (which would default to 2). Use SlowFuzzyQuery if the result is >=3 - if minSimilarity is 1 or 2, use FuzzyQuery - if minSimilarity is >=3, use SlowFuzzyQuery -Yonik http://lucidworks.com On Sun, Nov 11, 2012 at 10:32 PM, Yonik Seeley <[email protected]> wrote: > On Sun, Nov 11, 2012 at 4:18 PM, Jack Krupansky <[email protected]> > wrote: >> Okay, so maybe this is simply a case where “an adjustment” was made to >> Lucene and Solr did not make a corresponding “adustment” to compensate to >> “preserve” functionality. Solr users cannot easily override factory methods, >> but of course the Solr query parser can and probably should. > > Right - and Solr attempts to preserve external interfaces (HTTP apis > and query languages) even across major versions. > It could be argued that this is a regression - a loss of the ability > to use higher edit distances. > I'd support adding a fallback to SlowFuzzyQuery when the edit distance > turns out to be > 2. I'd even argue that it should do it by default > to retain the old behavior. Basically from the user perspective it > would look like edit distances of <= 2 were sped up. > > -Yonik > http://lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
