[EMAIL PROTECTED] wrote:
QueryParser can now handle minimumSimilarity parameter of FuzzyQuery; FuzzyQuery extended to allow for non-fuzzy prefixes.
This looks great!
It might also be good if one could set the non-fuzzy prefix length used by the QueryParser. As it stands, fuzzy queries with large indexes that use QueryParser are so slow they're unusable. But a default prefix of just a couple of characters would make a huge performance improvement.
That's true. We need it since we distinguish between full-forms (inflected words as they occur in the documents) and baseforms (after a linguistic analysis) in
the index by using prefixes. Wildcard/Prefix/Fuzzy Queries work on full-forms,
FieldQueries on both ....
I will think about extending QueryParser as you proposed (should not be too difficult, we only have to find a reasonable syntax), but I am a little bit
under pressure with other stuff. So I do not know when I will find time.
Everyone else may feel free to go ahead.
Another idea might be to, rather than (or in addition to) limiting the number of expanded terms by similarity, to limit them by number. So one could keep, e.g., just the top-scoring 100 terms whose score is greater than 0.5, or somesuch. This way FuzzyQuery would never trigger BooleanQuery.TooManyClauses. What do you think?
Also sounds reasonable. Of course it does not solve the efficiency problem of rewriting a FuzzyQuery. Do you think the expensive part is going through all terms of a field or is it the Levenstein-computation, or both?
I hope you like my extensions to PraseQuery and PhrasePrefixQuery too :-)
regards, Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]