On Dec 6, 2009, at 6:40 AM, Michael McCandless wrote:

> I think a hybrid approach may be worth exploring as well.  It'd let
> you trade off how much transient RAM you're willing to spend...
> 
> Ie, rather than insisting on heapifying after every insertion, allow
> many insertions to arrive and then use a selection/partition algorithm
> to periodically prune.
> 
> So if your want 10K results, but you're willing to tolerate say 2X the
> RAM usage, you could eg allow queue to fill until 20K. When you hit
> that 20K mark, you use selection algorithm to find a more accurate
> bottom, ie allowing you to prune away the 10K results that are not in
> fact competitive.  This also defines your new "bottom". For any new
> incoming results that come in, if they beat your current bottom, you
> accumulate them, up until 20K again, at which point you repeat.
> 
> Then only at the end you sort the results, which would probably be an
> optional operation -- ie, perhaps you simply want the top 10K hits but
> you don't need them to be sorted, if you'll do some other
> post-processing.

I really like this idea, too.  Now, to find time to take a crack at it: 
https://issues.apache.org/jira/browse/LUCENE-2127
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to