[jira] Commented: (LUCENE-1427) QueryWrapperFilter should not do scoring

Paul Elschot (JIRA) Tue, 28 Oct 2008 09:42:07 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643252#action_12643252
 ]


Paul Elschot commented on LUCENE-1427:
--------------------------------------

The new Filter api allows to split the concerns of which data structure to use 
for collecting the  docs in the DocIdSet and the cached data structure used to 
iterate over this set, and this is what shows up here.

For backward compatibility QueryWrapperFilter could use an OpenBitSet that is 
good for collecting the docids, but the new Filter api leaves it not really 
necessary to use a data structure at all (see my initial suggestion).

So the question is how we want to deal with the split between initial 
collecting and later repeated iterations. OpenBitSet is certainly good for 
collecting, so a good and backward compatible way would be to document the use 
of OpenBitSet in the javadocs of QueryWrapperFilter, and let 
CachingWrapperFilter decide later which data structure to cache.
The alternative would be to let CachingWrapperFilter always do the initial 
collecting , but that would not be backward compatible.

{{instanceof}} could be used to decide at CachingWrapperFilter to do this 
initial collecting when it's not sure that the given data structure allows 
repeated iteration, but it may be better to add a boolean method to DocIdSet 
that indicates whether the iterator can be used more than once or not. However, 
that is better left to LUCENE-1296 .

In short, I'd like to have a javadoc remark added to the original patch on the 
use of OpenBitSet, and leave the rest to LUCENE-1296 .

> QueryWrapperFilter should not do scoring
> ----------------------------------------
>
>                 Key: LUCENE-1427
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1427
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The purpose of QueryWrapperFilter is to simply filter to include the docIDs 
> that match the query.
> Its implementation is wasteful now because it computes scores for those 
> matching docs even though the score is unused.  We could fix this by getting 
> a Scorer and iterating through the docs without asking for the score:
> {code}
> Index: src/java/org/apache/lucene/search/QueryWrapperFilter.java
> ===================================================================
> --- src/java/org/apache/lucene/search/QueryWrapperFilter.java (revision 
> 707060)
> +++ src/java/org/apache/lucene/search/QueryWrapperFilter.java (working copy)
> @@ -62,11 +62,9 @@
>    public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
>      final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
>  
> -    new IndexSearcher(reader).search(query, new HitCollector() {
> -      public final void collect(int doc, float score) {
> -        bits.set(doc);  // set bit for hit
> -      }
> -    });
> +    final Scorer scorer = query.weight(new 
> IndexSearcher(reader)).scorer(reader);
> +    while(scorer.next())
> +      bits.set(scorer.doc());
>      return bits;
>    }
> {code}
> Maybe I'm missing something, but this seams like a simple win?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1427) QueryWrapperFilter should not do scoring

Reply via email to