If someone needs all results, they know it beforehand. Why can't they write this collector themselves? It's trivial, just like you said.
On Sun, Dec 6, 2009 at 01:22, Grant Ingersoll <gsing...@apache.org> wrote: > At ScaleCamp yesterday in the UK, I was listening to a talk on Xapian and the > speaker said one of the optimizations they do when retrieving a large result > set is that instead of managing a Priority Queue, they just allocate a large > array to hold all of the results and then sort afterward. Seemed like a > good idea since you could avoid a whole slew of PQ operations at the cost of > (possibly) some extra memory, so I thought I would see if anyone has looked > at doing something similar here. (Xapian is C++, I believe, so it likely has > different perf. characteristics which may or may not matter). > > A few things come to mind: > 1. In many cases, when someone is asking for a lot of results, my guess is > they want all the results. You often see this in people who are doing > significant post processing on large result sets (often for machine learning) > 2. My gut says there is actually some heuristic to be applied here, whereby > if the requested number of results is some percentage of the total num. of > hits, then do the whole results + sort, otherwise, use PQ. Thus, this maybe > would be useful even when doing something like, say, 500 or a 1000 docs, as > opposed to the bigger cases I'm thinking about. Still, don't need to > prematurely optimize. > 3. I think we could almost implement this entirely w/ a Collector, except > for the sort step, which presumably could be a callback (empty implementation > in the base class). > 4. When doing this, you don't bother to truncate the list when n < totalHits > (although see below). > > Perhaps, we could also even save having a big array for doc ids and instead > just flip bits in a bit set (would still need the array for scores) and then > materialize the bit set to an array at the end when the num requested is less > than the total number of hits. > > Anyone have thoughts on this? Seems fairly trivial to crank out a Collector > to do it, minus the post processing step, which would be relatively trivial > to add. > > -Grant > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org