On Thu, 2011-06-23 at 22:41 +0200, Tim Eck wrote: > I don't want to accuse anyone of bad code but always preallocating a > potentially large array in org.apache.lucene.util.PriorityQueue seems > non-ideal for the search I want to run.
The current implementation of IndexSearcher uses threaded search where each slice collects docID's independently, then adds them to a shared PriorityQueue one at a time. With this architecture, making the PriorityQueue size-optimized would either require multiple resizings (more GC activity, slightly more processing) or that all search-threads finishes before constructing the queue (longer response time). The current implementation works really well when requesting small result sets. It is not so fine for larger sets (partly because of memory allocation, partly because the standard heap-based priority queue has horrible locality, making it perform rather bad when it cannot be contained in the cache) and - as you have observed - really bad for the full document set. Finding a better general solution that covers all three cases is a real challenge, a very interesting one I might add. Of course one can always special case, but using a Collector seems like the way to go there. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org