Re: Optimizing unordered queries

Michael McCandless Fri, 26 Jun 2009 08:23:49 -0700

On Thu, Jun 25, 2009 at 10:11 PM, Nigel<nigelspl...@gmail.com> wrote:


> Currently we're (perhaps naively) doing the equivalent of
> query.weight(searcher).scorer(reader).score(collector).  Obviously there's a
> certain amount of unnecessary calculation that results from this if you
> don't care about sorting.  Are there any general recommendations for
> unordered searching?  (We already omit norms.)

As of 2.9, scoring is optional with the new Collector.  Ie, if your
Collector doesn't call scorer.score() then the score won't be
computed.

Also, 2.9 will enable out-of-docID-order scoring if your
Collector.acceptsDocsOutOfOrder returns true, but currently it's
disjunction only (plus up to 32 prohibited terms) that will see a
speedup from this.

What kind of queries are you running?

I assume your Collector simply aggregates the list of docIDs hit?  Ie
no sorting, priority queue, etc., in it.

> (More details: Of particular interest are things that access the TermInfos,
> since that's the major source of RAM usage: if a smaller number of TermInfos
> were needed then we could perhaps use an aggressive index divisor setting to
> save RAM without a performance penalty.  For example, I was thinking about a
> custom Similarity implementation that skipped the idf calculations, since
> those have to hit the TermInfos.)

Unfortunately the TermInfos must still be hit to look up the
freq/proxOffset in the postings files.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Optimizing unordered queries

Reply via email to