On Dec 6, 2009, at 6:40 AM, Michael McCandless wrote: > I think a hybrid approach may be worth exploring as well. It'd let > you trade off how much transient RAM you're willing to spend... > > Ie, rather than insisting on heapifying after every insertion, allow > many insertions to arrive and then use a selection/partition algorithm > to periodically prune. > > So if your want 10K results, but you're willing to tolerate say 2X the > RAM usage, you could eg allow queue to fill until 20K. When you hit > that 20K mark, you use selection algorithm to find a more accurate > bottom, ie allowing you to prune away the 10K results that are not in > fact competitive. This also defines your new "bottom". For any new > incoming results that come in, if they beat your current bottom, you > accumulate them, up until 20K again, at which point you repeat. > > Then only at the end you sort the results, which would probably be an > optional operation -- ie, perhaps you simply want the top 10K hits but > you don't need them to be sorted, if you'll do some other > post-processing.
I really like this idea, too. Now, to find time to take a crack at it: https://issues.apache.org/jira/browse/LUCENE-2127 --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org