Thanks a lot for your responses...
I have tried the HitCollector and throw an exception when the limit of hits is
reached...
It works fine and the search time is really reduce when there is a lot of docs
which are matching the query...
I did that :
public class CountCollector extends HitCollector{
public int cpt;
private int _maxHit;
public CountCollector(int maxHit)
{
cpt = 0;
_maxHit = maxHit
}
public void collect(int arg0, float arg1)
{
cpt++;
if (cpt > _max_Hit)
{
throw new LimitIsReachedException();
}
}
}
With a simple try catch, I catch the exception, and display "cpt" (the
counter)...
Best regards
----- Message d'origine ----
De : Andrzej Bialecki <[EMAIL PROTECTED]>
À : [email protected]
Envoyé le : Jeudi, 7 Août 2008, 14h29mn 31s
Objet : Re: Stop search process when a given number of hits is reached
Doron Cohen wrote:
> Nothing built in that I'm aware of will do this, but it can be done by
> searching with your own HitCollector.
> There is a related feature - stop search after a specified time - using
> TimeLimitedCollector.
> It is not released yet, see issue LUCENE-997.
> In short, the collector's collect() method is invoked in the search process
> for each matching document.
> Once 500 docs were collected, your collector can cause the search to stop by
> throwing an exception.
> Upon catching the exception you know that 500 docs were collected.
Two additional comments:
* the topN results from such incomplete search may be way off, if there
were some high scoring documents somewhere beyond the limit.
* if you know that there are more important and less important documents
in your corpus, and their relative weight is independent of the query
(e.g. PageRank-type score), then you can restructure your index so that
postings belonging to highly-scoring documents come first on the posting
lists - this way you have a better chance to collect highly relevant
documents first, even though the search is incomplete. You can find an
implementation of this concept in Nutch
(org.apache.nutch.indexer.IndexSorter).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_____________________________________________________________________________
Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr