[GitHub] msokolov commented on a change in pull request #579: PET: prorated early termination

GitBox Tue, 19 Feb 2019 10:21:59 -0800

msokolov commented on a change in pull request #579: PET: prorated early 
termination
URL: https://github.com/apache/lucene-solr/pull/579#discussion_r258168161


 ##########
 File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java
 ##########
 @@ -165,11 +169,35 @@ public void collect(int doc) throws IOException {
               updateMinCompetitiveScore(scorer);
             }
           }
+          if (canEarlyTerminate) {
+              // When early terminating, stop collecting hits from this leaf 
once we have its prorated hits.
+              if (leafHits > leafHitsThreshold) {
+                  totalHitsRelation = Relation.GREATER_THAN_OR_EQUAL_TO;
+                  throw new CollectionTerminatedException();
+              }
+          }
         }
 
       };
     }
 
+    /** The total number of documents that matched this query; may be a lower 
bound in case of early termination. */
+    @Override
+    public int getTotalHits() {
+      return totalHits;
+    }
+
+    private int prorateForSegment(int topK, LeafReaderContext leafCtx) {
+        // prorate number of hits to collect based on proportion of documents 
in this leaf (segment).
+        // p := probability of a top-k document (or any document) being in 
this segment
+        double p = (double) leafCtx.reader().numDocs() / 
leafCtx.parent.reader().numDocs();
 
 Review comment:
   Oh yes we do. I guess that would be an empty index. This also made me thing 
about deleted docs. It would be better to compute this ratio using livedocs I 
think? Basically we want to use numbers that correspond to what will be 
collected. Can we easily know the number of live docs in a segment and in the 
index?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] msokolov commented on a change in pull request #579: PET: prorated early termination

Reply via email to