Yes, thanks Paul.
We are already using
getSpans() on the top level SpanQuery, and use a loop
calling next() on the Spans, and ignore duplicate doc() values from
the Spans
in that loop.
A counter in the loop would also give you the number of matching
occurrences
of the SpanQuery.
I will look into
NearSpansOrdered here might be a bit faster than the NearSpans
However what significantly slows us down is the hits.id(i) function.
Can we accelerate it somehow "cleaning" Lucene code itself from
scoring?
Best regards
Boris
On Thursday 11 May 2006 22:42, Boris Galitsky wrote:
Hello
We don't need any scoring in our application domain, but
efficiency is the key because we are getting tens thousand of hits
for
span queries; all these hits are necessary to collect.
Is there a simple way to turn scoring off while indexing, while
search and while delivering document IDs to save on time?
You could use getSpans() on the top level SpanQuery, and use a loop
calling next() on the Spans, and ignore duplicate doc() values from
the Spans
in that loop.
A counter in the loop would also give you the number of matching
occurrences
of the SpanQuery.
This way of using the Spans directly should be slightly more
efficient than
using a HitCollector, but don't hold your breath.
In case you have ordered SpanQuery's without overlaps, the
NearSpansOrdered here might be a bit faster than the NearSpans
currently in Lucene:
http://issues.apache.org/jira/browse/LUCENE-413
(you'll also need the patch to SpanNearQuery).
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]