Doron Cohen wrote:
Nothing built in that I'm aware of will do this, but it can be done by
searching with your own HitCollector.
There is a related feature - stop search after a specified time - using
TimeLimitedCollector.
It is not released yet, see issue LUCENE-997.
In short, the collector's collect() method is invoked in the search process
for each matching document.
Once 500 docs were collected, your collector can cause the search to stop by
throwing an exception.
Upon catching the exception you know that 500 docs were collected.
Two additional comments:
* the topN results from such incomplete search may be way off, if there
were some high scoring documents somewhere beyond the limit.
* if you know that there are more important and less important documents
in your corpus, and their relative weight is independent of the query
(e.g. PageRank-type score), then you can restructure your index so that
postings belonging to highly-scoring documents come first on the posting
lists - this way you have a better chance to collect highly relevant
documents first, even though the search is incomplete. You can find an
implementation of this concept in Nutch
(org.apache.nutch.indexer.IndexSorter).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]