[
https://issues.apache.org/jira/browse/LUCENE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-4024:
--------------------------------
Attachment: LUCENE-4024.patch
I agree: this crazy floating point specification of distance is hairy to be
compatible with 3.x
But i think this is all a huge trap, attached is a patch that:
* removes slow capability from FuzzyTermsEnum
* Cleans up FuzzyQuery: removes float-ctors, allows transpositions as primitive
edits, etc.
* adds a deprecated SlowFuzzyQuery to sandbox/ that has the old ctors
* adds a deprecated SlowFuzzyTermsEnum that it uses, which extends
FuzzyTermsEnum and adds slowness.
I added a helper static method (deprecated) to FuzzyQuery that converts from
the old float sim stuff to number of edits, but ceilinged at what automata
support (this is used to easily cut over queryparsers).
All tests pass but patch needs javadocs. Especially I think we should adjust
the query syntax and mark the old ~0.xxx stuff as deprecated, since qps can
already do do ~1 ~2 now. Then we can really cleanup for 5.0
P.S. patch is huge since i didnt use SVN adds/removes, but makes it easy to
apply.
> FuzzyQuery should never do edit distance > 2
> --------------------------------------------
>
> Key: LUCENE-4024
> URL: https://issues.apache.org/jira/browse/LUCENE-4024
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-4024.patch
>
>
> Edit distance 1 and 2 are now very very fast compared to 3.x (100X-200X
> faster) ... but edit distance 3 will fallback to the super-slow scan all
> terms in 3.x, which is not graceful degradation.
> Not sure how to fix it ... mabye we have a SlowFuzzyQuery? And FuzzyQuery
> throws exc if you try to ask it to be slow? Or, we add boolean (off by
> default) that you must turn on to allow slow one..?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]