accelerate hits.id(i) function: eliminating scoring for the sake of efficiency

Boris Galitsky Thu, 11 May 2006 15:07:02 -0700

Yes, thanks Paul.

 We are already using

 getSpans() on the top level SpanQuery, and use a loop
calling next() on the Spans, and ignore duplicate doc() values fromthe Spans
in that loop.
A counter in the loop would also give you the number of matchingoccurrences
of the SpanQuery.


I will look into

NearSpansOrdered here  might be a bit faster than the NearSpans


However what significantly slows us down is the hits.id(i) function.

Can we accelerate it somehow "cleaning" Lucene code itself fromscoring?


Best regards
Boris

On Thursday 11 May 2006 22:42, Boris Galitsky wrote:
Hello
We don't need any scoring in our application domain, butefficiency is the key because we are getting tens thousand of hitsforspan queries; all these hits are necessary to collect.Is there a simple way to turn scoring off while indexing, whilesearch and while delivering document IDs to save on time?
You could use getSpans() on the top level SpanQuery, and use a loop
calling next() on the Spans, and ignore duplicate doc() values fromthe Spans
in that loop.
A counter in the loop would also give you the number of matchingoccurrences
of the SpanQuery.
This way of using the Spans directly should be slightly moreefficient than
using a HitCollector, but don't hold your breath.

In case you have ordered SpanQuery's without overlaps, the
NearSpansOrdered here  might be a bit faster than the NearSpans
currently in Lucene:
http://issues.apache.org/jira/browse/LUCENE-413
(you'll also need the patch to SpanNearQuery).

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

accelerate hits.id(i) function: eliminating scoring for the sake of efficiency

Reply via email to