+1 but with a harsh warning, maybe even in the log? In other words, it's not just performance, but also accuracy. I believe it breaks (in some manner) if the number of matching words passes 32k.
-- Mark Bennett / New Idea Engineering, Inc. / [email protected] Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 On Sun, Nov 11, 2012 at 1:43 PM, Yonik Seeley <[email protected]> wrote: > Thinking about this a bit more, I'm somewhat sympathetic to the > performance arguments when the user is using the minSimilarity type of > parameter (i.e. a number less than 1) since it's not obvious what > algorithm would be invoked (i.e. what the resulting edit distance > requested is), and the user is not requesting a specific edit distance > in any case (it's fuzzy ;-) > > When edit distance is used directly however (i.e. param >= 1), things > are both predictable and easy to document - there are no surprises. > So perhaps what makes the most sense is this: > - if minSimilarity < 1, then calculate the max edit distance based on > a parameter fuzzy.maxDistance or something (which would default to 2). > Use SlowFuzzyQuery if the result is >=3 > - if minSimilarity is 1 or 2, use FuzzyQuery > - if minSimilarity is >=3, use SlowFuzzyQuery > > -Yonik > http://lucidworks.com > > > On Sun, Nov 11, 2012 at 10:32 PM, Yonik Seeley <[email protected]> > wrote: > > On Sun, Nov 11, 2012 at 4:18 PM, Jack Krupansky <[email protected]> > wrote: > >> Okay, so maybe this is simply a case where “an adjustment” was made to > >> Lucene and Solr did not make a corresponding “adustment” to compensate > to > >> “preserve” functionality. Solr users cannot easily override factory > methods, > >> but of course the Solr query parser can and probably should. > > > > Right - and Solr attempts to preserve external interfaces (HTTP apis > > and query languages) even across major versions. > > It could be argued that this is a regression - a loss of the ability > > to use higher edit distances. > > I'd support adding a fallback to SlowFuzzyQuery when the edit distance > > turns out to be > 2. I'd even argue that it should do it by default > > to retain the old behavior. Basically from the user perspective it > > would look like edit distances of <= 2 were sped up. > > > > -Yonik > > http://lucidworks.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
