Your timing differences are probably because of caching. But this has been mentioned many times in the archive, that a Hits object is intended to allow fast, simple retrieval of the first few documents in a result set (100 if memory serves). Each 100 or so calls to next() causes the search to be re-issued.
See HitCollector, TopDocs, etc... Erick On 3/22/07, Andreas Guther <[EMAIL PROTECTED]> wrote:
Hi, While looking into performance enhancement for our search feature I noticed a significant difference in Documents access time while looping over Hits. I wrote a test application search for a list of search terms and then for each returned Hits object loops twice over every single hits.doc(i). for (int i = 0; i < numberOfDocs; i++) {doc = hits.doc(i);} I am seeing differences like the following Found 16,215 hits for 'Water or Wine' in 219 ms Processed 16,215 docs in 53,141 ms; per single doc 3.2773 ms Processed 16,215 docs in 2,032 ms; per single doc 0.1253 ms Interestingly if I run the same test application a second time in my IDE the difference between the first and the second loop is very low. I have no explanation why I see this difference but it becomes a huge problem for us due to the fact that I need to extract from each document a small set of information pieces and the first time looping just takes too much time. I could not find any indication for an external caching of Hits. I am running my tests within Eclipse with a memory setting of -Xms766M -Xmx1024M. What is the explanation in the different access speed for the same search results? Is there a way to speed up looping over the Hits data structure? Andreas --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]