On May 25, 2005, at 7:00 AM, Barbara Krausz wrote:
Hi,
Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to
retrieve all documents which contain at least e.g. 3 of the
queryterms. How can I implement this?
The first idea is to use BooleanQueries such as
(t1 and t2 and t3 and t4) or (t1 and t2 and t3) or(t1 and t2 and
t4) or (t1 and t3 and t4).....
But the perfomance is not very good when I have 20 queryterms and I
want to retrieve all docs which contain at least 15 of the terms.
Can I modify the skipto-algorithm in ConjunctionScorer in order to
achieve this?
Thanks
Barbara
PS: Has anybody written a Statistics-class which says how many term
and different terms are in the index. And perhaps computes the
mean length of the documents in the index with the standard deviation?
There is an interesting trick you can play with a custom Similarity
class on a BooleanQuery - check out the coord method. This could be
used to ensure that an "overlap" of 3 is mandatory for a match, for
example.
I'll leave the details of this as an exercise to the reader for the
moment.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]