[
https://issues.apache.org/jira/browse/LUCENE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643252#action_12643252
]
Paul Elschot commented on LUCENE-1427:
--------------------------------------
The new Filter api allows to split the concerns of which data structure to use
for collecting the docs in the DocIdSet and the cached data structure used to
iterate over this set, and this is what shows up here.
For backward compatibility QueryWrapperFilter could use an OpenBitSet that is
good for collecting the docids, but the new Filter api leaves it not really
necessary to use a data structure at all (see my initial suggestion).
So the question is how we want to deal with the split between initial
collecting and later repeated iterations. OpenBitSet is certainly good for
collecting, so a good and backward compatible way would be to document the use
of OpenBitSet in the javadocs of QueryWrapperFilter, and let
CachingWrapperFilter decide later which data structure to cache.
The alternative would be to let CachingWrapperFilter always do the initial
collecting , but that would not be backward compatible.
{{instanceof}} could be used to decide at CachingWrapperFilter to do this
initial collecting when it's not sure that the given data structure allows
repeated iteration, but it may be better to add a boolean method to DocIdSet
that indicates whether the iterator can be used more than once or not. However,
that is better left to LUCENE-1296 .
In short, I'd like to have a javadoc remark added to the original patch on the
use of OpenBitSet, and leave the rest to LUCENE-1296 .
> QueryWrapperFilter should not do scoring
> ----------------------------------------
>
> Key: LUCENE-1427
> URL: https://issues.apache.org/jira/browse/LUCENE-1427
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.9
>
>
> The purpose of QueryWrapperFilter is to simply filter to include the docIDs
> that match the query.
> Its implementation is wasteful now because it computes scores for those
> matching docs even though the score is unused. We could fix this by getting
> a Scorer and iterating through the docs without asking for the score:
> {code}
> Index: src/java/org/apache/lucene/search/QueryWrapperFilter.java
> ===================================================================
> --- src/java/org/apache/lucene/search/QueryWrapperFilter.java (revision
> 707060)
> +++ src/java/org/apache/lucene/search/QueryWrapperFilter.java (working copy)
> @@ -62,11 +62,9 @@
> public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
> final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
>
> - new IndexSearcher(reader).search(query, new HitCollector() {
> - public final void collect(int doc, float score) {
> - bits.set(doc); // set bit for hit
> - }
> - });
> + final Scorer scorer = query.weight(new
> IndexSearcher(reader)).scorer(reader);
> + while(scorer.next())
> + bits.set(scorer.doc());
> return bits;
> }
> {code}
> Maybe I'm missing something, but this seams like a simple win?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]