RE: field sorted searches with unbounded hit count

Tim Eck Thu, 23 Jun 2011 13:42:26 -0700

Thanks for the idea Ian. I still need to think about it, but the race between 
running the total count search and then the sorted search worries me. I have 
very pretty specific visibility guarantees I must provide on this data (with 
respect to concurrent updates). It'd be a bummer to have to block all 
concurrent updates to get these two searches to operate on an unchanging index.


I don't want to accuse anyone of bad code but always preallocating a 
potentially large array in org.apache.lucene.util.PriorityQueue seems non-ideal 
for the search I want to run. I'll have to dig into some more lucene code :-) 

FYI: TotalHitCountCollector looks like it was added in 3.1.0



-----Original Message-----
From: Ian Lea [mailto:ian....@gmail.com] 
Sent: Thursday, June 23, 2011 3:12 AM
To: java-user@lucene.apache.org
Subject: Re: field sorted searches with unbounded hit count

One possibility would be to execute the search first just to get the
number of hits - see TotalHitCountCollector in recent versions of
lucene, not sure when it was added - and use the hit count from that
as the max docs to return.  The counting only search would typically
be very quick, certainly much quicker than sorting a large number of
hits.


--
Ian.


On Wed, Jun 22, 2011 at 10:13 PM, Tim Eck <t...@terracottatech.com> wrote:
> For the searches I want to run on my index I want to return all matching
> documents (as opposed to N top hits).
>
>
>
> My first naļve approach was just to use Searcher.search(query, filter,
> Integer.MAX_VALUE, sort) – that is, pass Integer.MAX_VALUE for the number
> of possible docs to return. That unfortunately seems to have huge heap
> requirements in org.apache.lucene.util.PriorityQueue.heap as the max docID
> in my index gets large. Multiply that per search heap requirement by a
> handful of concurrent threads and I OOME my server.
>
>
>
> When I don’t need to do any sorting it pretty easy to just use my own
> collector to gather the doc ids.  Of course depending on the number of
> hits I might still need a good amount of heap but at least it a factor of
> the number of matches (not the index size).
>
>
>
> I’m struggling to figure out how to do the same search but with sorting.
> I’m looking for a method like Searcher.search(Query, Filter, Sort,
> Collector), but perhaps that isn’t a reasonable thing to have, please
> enlighten me if so :-)
>
>
>
> I’m using 3.0.3 lucene-core at the moment but I don’t see that this aspect
> is any different in 3.2.0.
>
>
>
> Hopefully this made sense, any help you can provide is appreciated.
>
>
>
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: field sorted searches with unbounded hit count

Reply via email to