Doug Cutting wrote:

Daniel Naber wrote:

On Tuesday 12 October 2004 17:22, Doug Cutting wrote:

Which is worse: a person who searches for Photokopie~ in a 1000 document
collection does not find documents containing Fotokopie; or a person who
searches for Photokopie~ in a 1M document collection doesn't find
anything because it takes too long. I think some relevant results are
better than none.


I disagree, as the user who doesn't get the "Fotokopie" matches will not understand what's going on. He will assume that there are no such documents, which is wrong. If there's a timeout the user will at least notice something is wrong. Besides that, it's the developers responsibility to get things fast enough. If he decides to do so with a prefix that might be okay for his use case.

my personal opinion, plus the experience I've made over the last years in the area of information retrieval would favorite Daniel's idea to set the prefix length to 0 per default. My personal arguments are:

1) most of the developers using lucene, either as a basis or as an enhancement on their own products, will deal with an index size not bigger than 10.000 documents. These group of developers are happy if they have an API which is easy to use and does exactly what they expect. They don't worry about internal features and just use it, the way they got it. With such an index size, they will never run into a timeout or performance problem and they're happy to find all documents belonging to a fuzzy query.

2) developers handling large document collection with more than 1M docs will study the possibilities and options they have within lucene to optimize their system. They will find the knob which has to be screwed when running into timeouts or memory problems. If not, they will ask the community to get an hint.

3) I would leave the functional behavior of lucene in future versions backward compatible as far as possible. It's no problem to change the API, making methods deprecated and and... Modern development environments are showing up the deprecation warnings, supporting developers to update the software. But they can't support us, if the query results are different changing from lucene 1.4 to lucene 1.9.

Bernhard





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to