Otis Gospodnetic wrote:
Try using HitCollector and break out of it when you collect enough documents.  
My guess is that if you are not doing anything crazy with Hits (like looping 
through the all) this won't be that much faster than using Hits.

Well, in practice it does help - see the way this is done in Nutch (src/java/org/apache/nutch/searcher/LuceneQueryOptimizer$LimitedCollector). Performance-wise, with large indexes this makes a big difference.

The problem that you need to address, though, is how usable are partial results, i.e. if you are reasonably sure that by collecting only partial results you are not missing important hits, which would have been found had you let the search collect all results ... This facility in Nutch is used only if posting lists are sorted by decreasing document importance (see IndexSorter for details), so that we collect first the most highly ranking hits, and skip low ranking ones.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to