Re: Caching filter wrapper (was Re: RE : DateFilter.Before/After)

Doug Cutting Tue, 16 Sep 2003 09:29:39 -0700

Bruce Ritchie wrote:

The times shown above is only the time taken to call the following code (numResults is a max of 1500 or hits.length(), whichever is smaller):
for (int i = 0; i < numResults; i++) {
   ids[i] = Long.parseLong((hits.doc(i)).get("messageID"));
}

This is not a recommended way to use Lucene. The intent is that you should only have to call Hits.doc() for documents that you actually display, usually around 10 per query. Is this still a bottleneck when you fetch a max of 10 or 20 documents?

So I'd be interested to hear why you need 1500 hits. My guess is that you're doing post-processing of hits, then selecting 10 or so to actually display. If you can figure out a way to do this post processing without accessing the document object, i.e., through the query, a custom HitCollector, or the SearchBean, then this optimization is probably not needed.

A 30% optimization to a slow algorithm is better than nothing, but it would be better yet to improve the algorithm. That said, this sort of improvement is not always trivial, and lots of people use Lucene in the way that you have, so it's still may be worth optimizing this.

If your post-processsing is done in order to sort the results, then I recommend trying the SearchBean, in the Lucene sandbox. I've never used it myself, but it is able to provide results sorted by any field without accessing the document object of each hit while the query is processed (it caches tables of field values when constructed). Examining the SearchBean code, I see an optimization: it would be more efficient if it used a HitCollector rather than a Hits when sorting, as the Hits may have to re-query a few times to get the full set of results, but even with that, I suspect you'd see a speedup.

I wonder if SearchBean, or something like it, should be added to the core? This is something lots of folks ask for. SearchBean's technique can use a fair amount of memory, but most folks are not short on RAM these days. One could optimize SearchBean's sorting for integer-valued fields, but that could also be done after it is added to the core.

What do folks think about adding SearchBean to the core? Perhaps it could be merged with the existing Hits code, as a primary API for accessing search results?

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Caching filter wrapper (was Re: RE : DateFilter.Before/After)

Reply via email to