Re: Limiting Result-Count

Andrzej Bialecki Thu, 29 Jun 2006 14:32:45 -0700

Otis Gospodnetic wrote:

Try using HitCollector and break out of it when you collect enough documents.  
My guess is that if you are not doing anything crazy with Hits (like looping 
through the all) this won't be that much faster than using Hits.

Well, in practice it does help - see the way this is done in Nutch(src/java/org/apache/nutch/searcher/LuceneQueryOptimizer$LimitedCollector).Performance-wise, with large indexes this makes a big difference.

The problem that you need to address, though, is how usable are partialresults, i.e. if you are reasonably sure that by collecting only partialresults you are not missing important hits, which would have been foundhad you let the search collect all results ... This facility in Nutch isused only if posting lists are sorted by decreasing document importance(see IndexSorter for details), so that we collect first the most highlyranking hits, and skip low ranking ones.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Limiting Result-Count

Reply via email to