Re: field sorted searches with unbounded hit count

Ian Lea Thu, 23 Jun 2011 03:13:14 -0700

One possibility would be to execute the search first just to get the
number of hits - see TotalHitCountCollector in recent versions of
lucene, not sure when it was added - and use the hit count from that
as the max docs to return.  The counting only search would typically
be very quick, certainly much quicker than sorting a large number of
hits.



--
Ian.


On Wed, Jun 22, 2011 at 10:13 PM, Tim Eck <[email protected]> wrote:
> For the searches I want to run on my index I want to return all matching
> documents (as opposed to N top hits).
>
>
>
> My first naļve approach was just to use Searcher.search(query, filter,
> Integer.MAX_VALUE, sort) – that is, pass Integer.MAX_VALUE for the number
> of possible docs to return. That unfortunately seems to have huge heap
> requirements in org.apache.lucene.util.PriorityQueue.heap as the max docID
> in my index gets large. Multiply that per search heap requirement by a
> handful of concurrent threads and I OOME my server.
>
>
>
> When I don’t need to do any sorting it pretty easy to just use my own
> collector to gather the doc ids.  Of course depending on the number of
> hits I might still need a good amount of heap but at least it a factor of
> the number of matches (not the index size).
>
>
>
> I’m struggling to figure out how to do the same search but with sorting.
> I’m looking for a method like Searcher.search(Query, Filter, Sort,
> Collector), but perhaps that isn’t a reasonable thing to have, please
> enlighten me if so :-)
>
>
>
> I’m using 3.0.3 lucene-core at the moment but I don’t see that this aspect
> is any different in 3.2.0.
>
>
>
> Hopefully this made sense, any help you can provide is appreciated.
>
>
>
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: field sorted searches with unbounded hit count

Reply via email to