+1
Now, is your proposal ONLY for the Solr query parser, or for the Lucene
query parser as well? Or... for FuzzyQuery itself?
-- Jack Krupansky
-----Original Message-----
From: Yonik Seeley
Sent: Sunday, November 11, 2012 1:43 PM
To: [email protected]
Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira]
[Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
Thinking about this a bit more, I'm somewhat sympathetic to the
performance arguments when the user is using the minSimilarity type of
parameter (i.e. a number less than 1) since it's not obvious what
algorithm would be invoked (i.e. what the resulting edit distance
requested is), and the user is not requesting a specific edit distance
in any case (it's fuzzy ;-)
When edit distance is used directly however (i.e. param >= 1), things
are both predictable and easy to document - there are no surprises.
So perhaps what makes the most sense is this:
- if minSimilarity < 1, then calculate the max edit distance based on
a parameter fuzzy.maxDistance or something (which would default to 2).
Use SlowFuzzyQuery if the result is >=3
- if minSimilarity is 1 or 2, use FuzzyQuery
- if minSimilarity is >=3, use SlowFuzzyQuery
-Yonik
http://lucidworks.com
On Sun, Nov 11, 2012 at 10:32 PM, Yonik Seeley <[email protected]> wrote:
On Sun, Nov 11, 2012 at 4:18 PM, Jack Krupansky <[email protected]>
wrote:
Okay, so maybe this is simply a case where “an adjustment” was made to
Lucene and Solr did not make a corresponding “adustment” to compensate to
“preserve” functionality. Solr users cannot easily override factory
methods,
but of course the Solr query parser can and probably should.
Right - and Solr attempts to preserve external interfaces (HTTP apis
and query languages) even across major versions.
It could be argued that this is a regression - a loss of the ability
to use higher edit distances.
I'd support adding a fallback to SlowFuzzyQuery when the edit distance
turns out to be > 2. I'd even argue that it should do it by default
to retain the old behavior. Basically from the user perspective it
would look like edit distances of <= 2 were sped up.
-Yonik
http://lucidworks.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]