Thanks for the idea Ian. I still need to think about it, but the race between running the total count search and then the sorted search worries me. I have very pretty specific visibility guarantees I must provide on this data (with respect to concurrent updates). It'd be a bummer to have to block all concurrent updates to get these two searches to operate on an unchanging index.
I don't want to accuse anyone of bad code but always preallocating a potentially large array in org.apache.lucene.util.PriorityQueue seems non-ideal for the search I want to run. I'll have to dig into some more lucene code :-) FYI: TotalHitCountCollector looks like it was added in 3.1.0 -----Original Message----- From: Ian Lea [mailto:ian....@gmail.com] Sent: Thursday, June 23, 2011 3:12 AM To: java-user@lucene.apache.org Subject: Re: field sorted searches with unbounded hit count One possibility would be to execute the search first just to get the number of hits - see TotalHitCountCollector in recent versions of lucene, not sure when it was added - and use the hit count from that as the max docs to return. The counting only search would typically be very quick, certainly much quicker than sorting a large number of hits. -- Ian. On Wed, Jun 22, 2011 at 10:13 PM, Tim Eck <t...@terracottatech.com> wrote: > For the searches I want to run on my index I want to return all matching > documents (as opposed to N top hits). > > > > My first naļve approach was just to use Searcher.search(query, filter, > Integer.MAX_VALUE, sort) – that is, pass Integer.MAX_VALUE for the number > of possible docs to return. That unfortunately seems to have huge heap > requirements in org.apache.lucene.util.PriorityQueue.heap as the max docID > in my index gets large. Multiply that per search heap requirement by a > handful of concurrent threads and I OOME my server. > > > > When I don’t need to do any sorting it pretty easy to just use my own > collector to gather the doc ids. Of course depending on the number of > hits I might still need a good amount of heap but at least it a factor of > the number of matches (not the index size). > > > > I’m struggling to figure out how to do the same search but with sorting. > I’m looking for a method like Searcher.search(Query, Filter, Sort, > Collector), but perhaps that isn’t a reasonable thing to have, please > enlighten me if so :-) > > > > I’m using 3.0.3 lucene-core at the moment but I don’t see that this aspect > is any different in 3.2.0. > > > > Hopefully this made sense, any help you can provide is appreciated. > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org