On Tuesday 12 October 2004 17:22, Doug Cutting wrote:
Which is worse: a person who searches for Photokopie~ in a 1000 document collection does not find documents containing Fotokopie; or a person who searches for Photokopie~ in a 1M document collection doesn't find anything because it takes too long. I think some relevant results are better than none.
I disagree, as the user who doesn't get the "Fotokopie" matches will not understand what's going on. He will assume that there are no such documents, which is wrong. If there's a timeout the user will at least notice something is wrong. Besides that, it's the developers responsibility to get things fast enough. If he decides to do so with a prefix that might be okay for his use case.
This is clearly not a black-and-white issue. Can other Lucene developers please offer their opinions?
The question is whether the QueryParser should, by default, require a one-or-two character prefix match for fuzzy terms, or a zero-character prefix, as it does today.
The advantages of a zero-character prefix default are that it's back-compatibile and that it will find more matches, when spelling differences are in the first characters.
The disadvantage of a zero-character prefix default is that it performs poorly for large collections, requring perhaps around 10 seconds for multi-million document collections, considerably slower than any other type of query supported by the QueryParser.
Similarly, the advantage of a one-or-two-character prefix default is that it will perform much better with larger collections. And the disadvantage is that it is an incompatible change, and it will miss some matches, those where the spelling differences are in the first characters.
Developers may always change this by calling QueryParser.setFuzzyPrefixLength(). So at issue is which behaviour is better for developers who do not know of this parameter. Is it more important that their applications perform well or that they find all matches to fuzzy queries?
Please offer your opinion and thoughts on this.
Thanks,
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]