[ 
https://issues.apache.org/jira/browse/LUCENE-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914787#action_12914787
 ] 

Simon Willnauer commented on LUCENE-2667:
-----------------------------------------


bq. I think its best to just change the defaults for this query, since it was 
so aweful before. We can add notes in migrate.txt that if you care about using 
the old values, then you should provide them explicitly, and you will get the 
same results!
+1 

Thanks robert for bringing this up. Changes to the queryparsers look good to me 
I only have one comment about the harmony code, could you put the svn path and 
revision into a comment so we can track possible changes later more easily? I 
personally think moving to 1.6 is far away :)


bq. I propose:

+1 to all the proposals 


> Fix FuzzyQuery's defaults, so its fast.
> ---------------------------------------
>
>                 Key: LUCENE-2667
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2667
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2667.patch
>
>
> We worked a lot on FuzzyQuery, but you need to be a rocket scientist to 
> ensure good results.
> The main problem is that the default distance is 0.5f, which doesn't take 
> into account the length of the string.
> To add insult to injury, the default number of expansions is 1024 
> (traditionally from BooleanQuery maxClauseCount)
> I propose:
> * The syntax of FuzzyQuery is enhanced, so that you can specify raw edits 
> too: such as foobar~2 (all terms within 2 levenshtein edits of foobar). 
> Previously if you specified any amount >=1, you got IllegalArgumentException, 
> so this won't break anyone. You can still use foobar~0.5, and it works just 
> as before
> * The default for minimumSimilarity then becomes 
> LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, which is 2. This way if you 
> just do foobar~, its always fast.
> * The size of the priority queue is reduced by default from 1024 to a much 
> more reasonable value: 50. This is what FuzzyLikeThis uses.
> I think its best to just change the defaults for this query, since it was so 
> aweful before. We can add notes in migrate.txt that if you care about using 
> the old values, then you should provide them explicitly, and you will get the 
> same results!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to