One possibility would be to execute the search first just to get the number of hits - see TotalHitCountCollector in recent versions of lucene, not sure when it was added - and use the hit count from that as the max docs to return. The counting only search would typically be very quick, certainly much quicker than sorting a large number of hits.
-- Ian. On Wed, Jun 22, 2011 at 10:13 PM, Tim Eck <t...@terracottatech.com> wrote: > For the searches I want to run on my index I want to return all matching > documents (as opposed to N top hits). > > > > My first naļve approach was just to use Searcher.search(query, filter, > Integer.MAX_VALUE, sort) – that is, pass Integer.MAX_VALUE for the number > of possible docs to return. That unfortunately seems to have huge heap > requirements in org.apache.lucene.util.PriorityQueue.heap as the max docID > in my index gets large. Multiply that per search heap requirement by a > handful of concurrent threads and I OOME my server. > > > > When I don’t need to do any sorting it pretty easy to just use my own > collector to gather the doc ids. Of course depending on the number of > hits I might still need a good amount of heap but at least it a factor of > the number of matches (not the index size). > > > > I’m struggling to figure out how to do the same search but with sorting. > I’m looking for a method like Searcher.search(Query, Filter, Sort, > Collector), but perhaps that isn’t a reasonable thing to have, please > enlighten me if so :-) > > > > I’m using 3.0.3 lucene-core at the moment but I don’t see that this aspect > is any different in 3.2.0. > > > > Hopefully this made sense, any help you can provide is appreciated. > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org