I dont agree with suggesting the slow, unscalable approach. This query supports up to 2 distances including transpositions, anything beyond that is basically going to match a significant portion of the term dictionary and not really be useful.
If someone has special data where this makes sense, they should use an n-gram indexing technique or the spellchecker module, or BLAST or something other than Lucene. As far as the constant of 2: this is actually in the javadocs: you have to click CONSTANT VALUES. On Thu, Sep 13, 2012 at 12:03 PM, Jack Krupansky <[email protected]> wrote: > The automaton support for FuzzyQuery added the severe limitation to > FuzzyQuery of an editing distance of 2 that needs to be documented in the > Javadoc. A reference to SlowFuzzyQuery is also needed in the Javadoc, even > though that class is deprecated. > > The constructor Javadoc does say “maxEdits - must be >= 0 and <= > LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE”, but neither the text nor > that link documents the extreme limitation of 2. I mean, a casual reader > might reasonably expect that it is just some big number like > Integer.MAX_VALUE. The rationale from the Jira should be succinctly stated, > at the class level as well. > > Relevant Jira: > https://issues.apache.org/jira/browse/LUCENE-4024 > > -- Jack Krupansky -- lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
