search FuzzyQuery.java FuzzyTermEnum.java

Christoph Goller Wed, 15 Sep 2004 02:58:19 -0700

Doug Cutting wrote:

[EMAIL PROTECTED] wrote:
  QueryParser can now handle minimumSimilarity parameter
  of FuzzyQuery; FuzzyQuery extended to allow for non-fuzzy
  prefixes.
This looks great!
It might also be good if one could set the non-fuzzy prefix length used by the QueryParser. As it stands, fuzzy queries with large indexes that use QueryParser are so slow they're unusable. But a default prefix of just a couple of characters would make a huge performance improvement.

That's true. We need it since we distinguish between full-forms (inflected words as they occur in the documents) and baseforms (after a linguistic analysis) in the index by using prefixes. Wildcard/Prefix/Fuzzy Queries work on full-forms, FieldQueries on both ....

I will think about extending QueryParser as you proposed (should not be too difficult, we only have to find a reasonable syntax), but I am a little bit under pressure with other stuff. So I do not know when I will find time. Everyone else may feel free to go ahead.

Another idea might be to, rather than (or in addition to) limiting the number of expanded terms by similarity, to limit them by number. So one could keep, e.g., just the top-scoring 100 terms whose score is greater than 0.5, or somesuch. This way FuzzyQuery would never trigger BooleanQuery.TooManyClauses. What do you think?


Also sounds reasonable. Of course it does not solve the efficiency problem
of rewriting a FuzzyQuery. Do you think the expensive part is going through
all terms of a field or is it the Levenstein-computation, or both?

I hope you like my extensions to PraseQuery and PhrasePrefixQuery too :-)

regards,
Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/search FuzzyQuery.java FuzzyTermEnum.java

Reply via email to