Re: [PR] Optimize FieldExistsQuery.count() when all docs have the field [lucene]

via GitHub Wed, 03 Jun 2026 16:25:29 -0700


msfroh commented on code in PR #16111:
URL: https://github.com/apache/lucene/pull/16111#discussion_r3352444128



##########
lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java:
##########
@@ -234,22 +234,43 @@ public int count(LeafReaderContext context) throws 
IOException {
 
           return super.count(context);
         } else if (fieldInfo.hasVectorValues()) { // the field indexes vectors
+          int vectorCount = getVectorValuesSize(fieldInfo, reader);
+          if (vectorCount == reader.maxDoc()) {
+            // All docs have vectors, so the count is just the number of live 
docs.
+            return reader.numDocs();
+          }
           if (reader.hasDeletions() == false) {
-            return getVectorValuesSize(fieldInfo, reader);
+            return vectorCount;
           }
           return super.count(context);
         } else if (fieldInfo.getDocValuesType()
             != DocValuesType.NONE) { // the field indexes doc values
-          if (reader.hasDeletions() == false) {
+          if (fieldInfo.docValuesSkipIndexType() != 
DocValuesSkipIndexType.NONE) {
+            // DocValuesSkipper provides an exact doc count for doc values, so 
we can use it
+            // reliably even in the presence of deletions.
+            DocValuesSkipper docValuesSkipper = 
reader.getDocValuesSkipper(field);
+            if (docValuesSkipper != null) {
+              int docCount = docValuesSkipper.docCount();
+              if (docCount == reader.maxDoc()) {
+                // Every doc has a value for this field, the count is the 
number of live docs.
+                return reader.numDocs();
+              }
+              if (reader.hasDeletions() == false) {
+                return docCount;
+              }
+            }
+          } else if (reader.hasDeletions() == false) {
+            // No deletions: we can use points or terms doc count as a proxy 
for doc values.
             if (fieldInfo.getPointDimensionCount() > 0) {
               PointValues pointValues = reader.getPointValues(field);
-              return pointValues == null ? 0 : pointValues.getDocCount();
+              if (pointValues != null) {
+                return pointValues.getDocCount();
+              }

Review Comment:
   Yeah... losing the zero case for this and the `Terms` check is not ideal.



##########
lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java:
##########
@@ -234,22 +234,43 @@ public int count(LeafReaderContext context) throws 
IOException {
 
           return super.count(context);
         } else if (fieldInfo.hasVectorValues()) { // the field indexes vectors
+          int vectorCount = getVectorValuesSize(fieldInfo, reader);
+          if (vectorCount == reader.maxDoc()) {
+            // All docs have vectors, so the count is just the number of live 
docs.
+            return reader.numDocs();
+          }
           if (reader.hasDeletions() == false) {
-            return getVectorValuesSize(fieldInfo, reader);
+            return vectorCount;
           }
           return super.count(context);
         } else if (fieldInfo.getDocValuesType()
             != DocValuesType.NONE) { // the field indexes doc values
-          if (reader.hasDeletions() == false) {
+          if (fieldInfo.docValuesSkipIndexType() != 
DocValuesSkipIndexType.NONE) {
+            // DocValuesSkipper provides an exact doc count for doc values, so 
we can use it
+            // reliably even in the presence of deletions.

Review Comment:
   This comment is a bit confusing. 
   
   I interpreted it as "The `DocValuesSkipper` knows how many live docs have 
the field", which confused me, since it doesn't have access to the live docs 
(since the `DocValuesProducer`'s `SegmentReadState` has `SegmentInfos`, but not 
`SegmentCommitInfos`).
   
   Actually reading the code, it's clear that we're saying "If every document 
in the segment, live or deleted, has the field, then the count is the number of 
live docs."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize FieldExistsQuery.count() when all docs have the field [lucene]

Reply via email to