If someone needs all results, they know it beforehand. Why can't they
write this collector themselves? It's trivial, just like you said.

On Sun, Dec 6, 2009 at 01:22, Grant Ingersoll <gsing...@apache.org> wrote:
> At ScaleCamp yesterday in the UK, I was listening to a talk on Xapian and the 
> speaker said one of the optimizations they do when retrieving a large result 
> set is that instead of managing a Priority Queue, they just allocate a large 
> array to hold all of the results and then sort afterward.   Seemed like a 
> good idea since you could avoid a whole slew of PQ operations at the cost of 
> (possibly) some extra memory, so I thought I would see if anyone has looked 
> at doing something similar here.  (Xapian is C++, I believe, so it likely has 
> different perf. characteristics which may or may not matter).
>
> A few things come to mind:
> 1. In many cases, when someone is asking for a lot of results, my guess is 
> they want all the results.  You often see this in people who are doing 
> significant post processing on large result sets (often for machine learning)
> 2. My gut says there is actually some heuristic to be applied here, whereby 
> if the requested number of results is some percentage of the total num. of 
> hits, then do the whole results + sort, otherwise, use PQ.  Thus, this maybe 
> would be useful even when doing something like, say, 500 or a 1000 docs, as 
> opposed to the bigger cases I'm thinking about.  Still, don't need to 
> prematurely optimize.
> 3.  I think we could almost implement this entirely w/ a Collector, except 
> for the sort step, which presumably could be a callback (empty implementation 
> in the base class).
> 4. When doing this, you don't bother to truncate the list when n < totalHits 
> (although see below).
>
> Perhaps, we could also even save having a big array for doc ids and instead 
> just flip bits in a bit set (would still need the array for scores) and then 
> materialize the bit set to an array at the end when the num requested is less 
> than the total number of hits.
>
> Anyone have thoughts on this?  Seems fairly trivial to crank out a Collector 
> to do it, minus the post processing step, which would be relatively trivial 
> to add.
>
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to