+1

Now, is your proposal ONLY for the Solr query parser, or for the Lucene query parser as well? Or... for FuzzyQuery itself?

-- Jack Krupansky

-----Original Message----- From: Yonik Seeley
Sent: Sunday, November 11, 2012 1:43 PM
To: [email protected]
Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

Thinking about this a bit more, I'm somewhat sympathetic to the
performance arguments when the user is using the minSimilarity type of
parameter (i.e. a number less than 1) since it's not obvious what
algorithm would be invoked (i.e. what the resulting edit distance
requested is), and the user is not requesting a specific edit distance
in any case (it's fuzzy ;-)

When edit distance is used directly however (i.e. param >= 1), things
are both predictable and easy to document - there are no surprises.
So perhaps what makes the most sense is this:
- if minSimilarity < 1, then calculate the max edit distance based on
a parameter fuzzy.maxDistance or something (which would default to 2).
Use SlowFuzzyQuery if the result is >=3
- if minSimilarity is 1 or 2, use FuzzyQuery
- if minSimilarity is >=3, use SlowFuzzyQuery

-Yonik
http://lucidworks.com


On Sun, Nov 11, 2012 at 10:32 PM, Yonik Seeley <[email protected]> wrote:
On Sun, Nov 11, 2012 at 4:18 PM, Jack Krupansky <[email protected]> wrote:
Okay, so maybe this is simply a case where “an adjustment” was made to
Lucene and Solr did not make a corresponding “adustment” to compensate to
“preserve” functionality. Solr users cannot easily override factory methods,
but of course the Solr query parser can and probably should.

Right - and Solr attempts to preserve external interfaces (HTTP apis
and query languages) even across major versions.
It could be argued that this is a regression - a loss of the ability
to use higher edit distances.
I'd support adding a fallback to SlowFuzzyQuery when the edit distance
turns out to be > 2.  I'd even argue that it should do it by default
to retain the old behavior.  Basically from the user perspective it
would look like edit distances of <= 2 were sped up.

-Yonik
http://lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to