In working with Lucene, I notice that when performing searches, it retrieves the documents for the same term multiple times. I think this may be because the Hits collection only stores a certain number of items, but would it not be better to just increase the size of the Hits collection, rather than perform the extra, relatively very expensive, read of the term docs.
The following is the trace output from Lucene performing 2 single term searches, and a multiple term search: (notice that in each case, the documents for a term are asked for twice). expression = +epson, query = +text:epson findTermInfo() text:epson, time = 0 SearchTermDocs, seek() on text:epson SearchTermDocs, seek() on text:epson [cached] find, hits = 224, query time = 16, doc (150) time = 15, total time = 31 expression = +printer, query = +text:printer findTermInfo() text:printer, time = 16 SearchTermDocs, seek() on text:printer SearchTermDocs, seek() on text:printer [cached] find, hits = 5358, query time = 62, doc (150) time = 282, total time = 344 expression = +epson +printer, query = +text:epson +text:printer SearchTermDocs, seek() on text:epson [cached] SearchTermDocs, seek() on text:printer [cached] SearchTermDocs, seek() on text:epson [cached] SearchTermDocs, seek() on text:printer [cached] find, hits = 175, query time = 15, doc (150) time = 47, total time = 62 In order to limit the performance hit, or implementation caches the returned docs within a query (the [cached] tag), but it seems the issue would be better addressed by the Lucene engine. Any thoughts on this? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
