[
https://issues.apache.org/jira/browse/LUCENE-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir resolved LUCENE-2667.
---------------------------------
Resolution: Fixed
Committed revision 1002214
> Fix FuzzyQuery's defaults, so its fast.
> ---------------------------------------
>
> Key: LUCENE-2667
> URL: https://issues.apache.org/jira/browse/LUCENE-2667
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 4.0
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2667.patch, LUCENE-2667.patch
>
>
> We worked a lot on FuzzyQuery, but you need to be a rocket scientist to
> ensure good results.
> The main problem is that the default distance is 0.5f, which doesn't take
> into account the length of the string.
> To add insult to injury, the default number of expansions is 1024
> (traditionally from BooleanQuery maxClauseCount)
> I propose:
> * The syntax of FuzzyQuery is enhanced, so that you can specify raw edits
> too: such as foobar~2 (all terms within 2 levenshtein edits of foobar).
> Previously if you specified any amount >=1, you got IllegalArgumentException,
> so this won't break anyone. You can still use foobar~0.5, and it works just
> as before
> * The default for minimumSimilarity then becomes
> LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, which is 2. This way if you
> just do foobar~, its always fast.
> * The size of the priority queue is reduced by default from 1024 to a much
> more reasonable value: 50. This is what FuzzyLikeThis uses.
> I think its best to just change the defaults for this query, since it was so
> aweful before. We can add notes in migrate.txt that if you care about using
> the old values, then you should provide them explicitly, and you will get the
> same results!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]