On May 25, 2005, at 7:00 AM, Barbara Krausz wrote:
Hi,

Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to retrieve all documents which contain at least e.g. 3 of the queryterms. How can I implement this?
The first idea is to use BooleanQueries such as
(t1 and t2 and t3 and t4) or (t1 and t2 and t3) or(t1 and t2 and t4) or (t1 and t3 and t4).....

But the perfomance is not very good when I have 20 queryterms and I want to retrieve all docs which contain at least 15 of the terms. Can I modify the skipto-algorithm in ConjunctionScorer in order to achieve this?

Thanks
Barbara

PS: Has anybody written a Statistics-class which says how many term and different terms are in the index. And perhaps computes the mean length of the documents in the index with the standard deviation?

There is an interesting trick you can play with a custom Similarity class on a BooleanQuery - check out the coord method. This could be used to ensure that an "overlap" of 3 is mandatory for a match, for example.

I'll leave the details of this as an exercise to the reader for the moment.

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to