Yes, thanks Paul.

 We are already using
 getSpans() on the top level SpanQuery, and use a loop
calling next() on the Spans, and ignore duplicate doc() values from the Spans
in that loop.
A counter in the loop would also give you the number of matching occurrences
of the SpanQuery.

I will look into
NearSpansOrdered here  might be a bit faster than the NearSpans

However what significantly slows us down is the hits.id(i) function.
Can we accelerate it somehow "cleaning" Lucene code itself from scoring?

Best regards
Boris



On Thursday 11 May 2006 22:42, Boris Galitsky wrote:
Hello

We don't need any scoring in our application domain, but efficiency is the key because we are getting tens thousand of hits for span queries; all these hits are necessary to collect. Is there a simple way to turn scoring off while indexing, while search and while delivering document IDs to save on time?

You could use getSpans() on the top level SpanQuery, and use a loop
calling next() on the Spans, and ignore duplicate doc() values from the Spans
in that loop.
A counter in the loop would also give you the number of matching occurrences
of the SpanQuery.

This way of using the Spans directly should be slightly more efficient than
using a HitCollector, but don't hold your breath.

In case you have ordered SpanQuery's without overlaps, the
NearSpansOrdered here  might be a bit faster than the NearSpans
currently in Lucene:
http://issues.apache.org/jira/browse/LUCENE-413
(you'll also need the patch to SpanNearQuery).

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to