Following suggestion is weaker than the requested functionality, but maybe you'll find the concept useful to ignore so called "garbage" results.
Assume that the query is a simple OR query made of a few words. By examining the frequencies of these words in the index (their DFs) devise a synthetic document which is the worst document you will be willing to accept as a useful result. Alternatively ignore DFs, but create a few documents like this - each perhaps containing one or few of the query words (and likely many other words). Now virtually add the synthetic document(s) to the index. Can be done by creating a small in memory index, and creating a multiIndexReader on top of the real index and the dummy one. Now execute the query, with a filter that accepts only the synthetic documents. The score of the worst acceptable document(s) can be used as a threshold when running the query on the original index. It is inefficient - should be done for each query, and would be hard to implement for general queries, and I never tried it... Doron 2008/8/8 Александр Аристов <[EMAIL PROTECTED]> > Query independent means that the threshold should have the same relevance > for all queries and discard found docs below it. Current scoring > implementation doesn't give guaranties that, say two documents found in two > queries and which got the same score 0.5 are of the same quality. > > I don't want discarding docs from being indexed, no. But I want to be sure > that two docs with the same score in two different queries have the same > quality (they contain the same set of found terms, lenght etc.) > > Alexander > > -----Original Message----- > From: Andrzej Bialecki <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Date: Thu, 07 Aug 2008 22:44:46 +0200 > Subject: Re: lucene scoring > > > Александр Аристов wrote: > > I want implement searching with ability to set so-called a confidence > > level below which I would treat documents as garbage. I cannot defile > > the level per query as the level should be relevant for all > > documents. > > Hmm .. I'm not sure if I understand it properly - if the level is > query-independent, then it's a constant factor, which you can put in a > field during the index creation - and then you could use a Filter or > FunctionQuery to exclude documents with this factor below the threshold. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >