I recently posted some questions about performance problems with large indexes. One key thing about our situation is that we don't need sorted results (either by relevance or any other key). I've been looking into our memory usage and tracing through some code, which in combination with the recent posts about setTermInfosIndexDivisor got me thinking about the best way to do a query where the order of results doesn't matter.
Currently we're (perhaps naively) doing the equivalent of query.weight(searcher).scorer(reader).score(collector). Obviously there's a certain amount of unnecessary calculation that results from this if you don't care about sorting. Are there any general recommendations for unordered searching? (We already omit norms.) (More details: Of particular interest are things that access the TermInfos, since that's the major source of RAM usage: if a smaller number of TermInfos were needed then we could perhaps use an aggressive index divisor setting to save RAM without a performance penalty. For example, I was thinking about a custom Similarity implementation that skipped the idf calculations, since those have to hit the TermInfos.) Thanks, Chris