[ 
https://issues.apache.org/jira/browse/LUCENE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4024:
--------------------------------

    Attachment: LUCENE-4024.patch

I agree: this crazy floating point specification of distance is hairy to be 
compatible with 3.x

But i think this is all a huge trap, attached is a patch that:
* removes slow capability from FuzzyTermsEnum
* Cleans up FuzzyQuery: removes float-ctors, allows transpositions as primitive 
edits, etc.
* adds a deprecated SlowFuzzyQuery to sandbox/ that has the old ctors
* adds a deprecated SlowFuzzyTermsEnum that it uses, which extends 
FuzzyTermsEnum and adds slowness.

I added a helper static method (deprecated) to FuzzyQuery that converts from 
the old float sim stuff to number of edits, but ceilinged at what automata 
support (this is used to easily cut over queryparsers).

All tests pass but patch needs javadocs. Especially I think we should adjust 
the query syntax and mark the old ~0.xxx stuff as deprecated, since qps can 
already do do ~1 ~2 now. Then we can really cleanup for 5.0

P.S. patch is huge since i didnt use SVN adds/removes, but makes it easy to 
apply.
                
> FuzzyQuery should never do edit distance > 2
> --------------------------------------------
>
>                 Key: LUCENE-4024
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4024
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-4024.patch
>
>
> Edit distance 1 and 2 are now very very fast compared to 3.x (100X-200X 
> faster) ... but edit distance 3 will fallback to the super-slow scan all 
> terms in 3.x, which is not graceful degradation.
> Not sure how to fix it ... mabye we have a SlowFuzzyQuery?  And FuzzyQuery 
> throws exc if you try to ask it to be slow?  Or, we add boolean (off by 
> default) that you must turn on to allow slow one..?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to