jainankitk commented on code in PR #16111:
URL: https://github.com/apache/lucene/pull/16111#discussion_r3352354039
##########
lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java:
##########
@@ -234,22 +234,43 @@ public int count(LeafReaderContext context) throws
IOException {
return super.count(context);
} else if (fieldInfo.hasVectorValues()) { // the field indexes vectors
+ int vectorCount = getVectorValuesSize(fieldInfo, reader);
+ if (vectorCount == reader.maxDoc()) {
+ // All docs have vectors, so the count is just the number of live
docs.
+ return reader.numDocs();
+ }
if (reader.hasDeletions() == false) {
- return getVectorValuesSize(fieldInfo, reader);
+ return vectorCount;
}
return super.count(context);
} else if (fieldInfo.getDocValuesType()
!= DocValuesType.NONE) { // the field indexes doc values
- if (reader.hasDeletions() == false) {
+ if (fieldInfo.docValuesSkipIndexType() !=
DocValuesSkipIndexType.NONE) {
+ // DocValuesSkipper provides an exact doc count for doc values, so
we can use it
+ // reliably even in the presence of deletions.
+ DocValuesSkipper docValuesSkipper =
reader.getDocValuesSkipper(field);
+ if (docValuesSkipper != null) {
+ int docCount = docValuesSkipper.docCount();
+ if (docCount == reader.maxDoc()) {
+ // Every doc has a value for this field, the count is the
number of live docs.
+ return reader.numDocs();
+ }
+ if (reader.hasDeletions() == false) {
+ return docCount;
+ }
+ }
+ } else if (reader.hasDeletions() == false) {
+ // No deletions: we can use points or terms doc count as a proxy
for doc values.
if (fieldInfo.getPointDimensionCount() > 0) {
PointValues pointValues = reader.getPointValues(field);
- return pointValues == null ? 0 : pointValues.getDocCount();
+ if (pointValues != null) {
+ return pointValues.getDocCount();
+ }
Review Comment:
new logic falls through to `super.count(context) == -1`, which forces
`IndexSearcher` to do a full scan to arrive at the same 0. Not a correctness
regression, but it does take a constant-time path to a linear one for these
edge cases. Probably fine since the case is rare in practice?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]