[jira] Commented: (LUCENE-1427) QueryWrapperFilter should not do scoring

Michael McCandless (JIRA) Tue, 28 Oct 2008 12:14:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643310#action_12643310
 ]


Michael McCandless commented on LUCENE-1427:
--------------------------------------------

Actually, can't we simply instantiate a new scorer each time iterator() is 
called?  Then we don't need an intermediate OpenBitSet and we can simply return 
the scorer (your original suggestion).

The only problem is... we then need to add "throws IOException" to 
DocIdSet.iterator().  While that is technically a non-back-compatible change 
(places that call DocIdSet.iterator() may suddenly have to add "throws 
IOException" to their method signatures, up the chain), I think it's likely 
very rare in practice that a code change would be needed, since the next() 
method of the iterator throws IOException and presumably almost all code that 
gets an iterator then next()'s through it.  There were no changes in Lucene's 
core or contrib sources necessary on adding this.  I think it's an acceptable 
change.

Then the patch looks like this:
{code}
Index: src/java/org/apache/lucene/search/DocIdSet.java
===================================================================
--- src/java/org/apache/lucene/search/DocIdSet.java     (revision 708628)
+++ src/java/org/apache/lucene/search/DocIdSet.java     (working copy)
@@ -17,11 +17,12 @@
  * limitations under the License.
  */
 
+import java.io.IOException;
 
 /**
  * A DocIdSet contains a set of doc ids. Implementing classes must provide
  * a [EMAIL PROTECTED] DocIdSetIterator} to access the set. 
  */
 public abstract class DocIdSet {
-       public abstract DocIdSetIterator iterator();
+       public abstract DocIdSetIterator iterator() throws IOException;
 }
Index: src/java/org/apache/lucene/search/QueryWrapperFilter.java
===================================================================
--- src/java/org/apache/lucene/search/QueryWrapperFilter.java   (revision 
708628)
+++ src/java/org/apache/lucene/search/QueryWrapperFilter.java   (working copy)
@@ -59,15 +59,13 @@
     return bits;
   }
   
-  public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
-    final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
-
-    new IndexSearcher(reader).search(query, new HitCollector() {
-      public final void collect(int doc, float score) {
-        bits.set(doc);  // set bit for hit
+  public DocIdSet getDocIdSet(final IndexReader reader) throws IOException {
+    final Weight weight = query.weight(new IndexSearcher(reader));
+    return new DocIdSet() {
+      public DocIdSetIterator iterator() throws IOException {
+        return weight.scorer(reader);
       }
-    });
-    return bits;
+    };
   }
 
   public String toString() {
{code}

I do agree, longer term, that clarifying the semantics to allow some DocIDSets 
that do not allow more than one call to iterator(), and then requiring 
something like CachingWrapperFilter to "translate" between different DocIdSets 
(compact or not, re-iterable, etc) is worth thinking about.  Though, besides 
this case, which seems easy to fix by just getting another scorer in 
iterator(), are there other places where not having to provide a repeatable 
iterator buys us some compelling freedom?


> QueryWrapperFilter should not do scoring
> ----------------------------------------
>
>                 Key: LUCENE-1427
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1427
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The purpose of QueryWrapperFilter is to simply filter to include the docIDs 
> that match the query.
> Its implementation is wasteful now because it computes scores for those 
> matching docs even though the score is unused.  We could fix this by getting 
> a Scorer and iterating through the docs without asking for the score:
> {code}
> Index: src/java/org/apache/lucene/search/QueryWrapperFilter.java
> ===================================================================
> --- src/java/org/apache/lucene/search/QueryWrapperFilter.java (revision 
> 707060)
> +++ src/java/org/apache/lucene/search/QueryWrapperFilter.java (working copy)
> @@ -62,11 +62,9 @@
>    public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
>      final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
>  
> -    new IndexSearcher(reader).search(query, new HitCollector() {
> -      public final void collect(int doc, float score) {
> -        bits.set(doc);  // set bit for hit
> -      }
> -    });
> +    final Scorer scorer = query.weight(new 
> IndexSearcher(reader)).scorer(reader);
> +    while(scorer.next())
> +      bits.set(scorer.doc());
>      return bits;
>    }
> {code}
> Maybe I'm missing something, but this seams like a simple win?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1427) QueryWrapperFilter should not do scoring

Reply via email to