msfroh commented on code in PR #16111:
URL: https://github.com/apache/lucene/pull/16111#discussion_r3352444128
##########
lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java:
##########
@@ -234,22 +234,43 @@ public int count(LeafReaderContext context) throws
IOException {
return super.count(context);
} else if (fieldInfo.hasVectorValues()) { // the field indexes vectors
+ int vectorCount = getVectorValuesSize(fieldInfo, reader);
+ if (vectorCount == reader.maxDoc()) {
+ // All docs have vectors, so the count is just the number of live
docs.
+ return reader.numDocs();
+ }
if (reader.hasDeletions() == false) {
- return getVectorValuesSize(fieldInfo, reader);
+ return vectorCount;
}
return super.count(context);
} else if (fieldInfo.getDocValuesType()
!= DocValuesType.NONE) { // the field indexes doc values
- if (reader.hasDeletions() == false) {
+ if (fieldInfo.docValuesSkipIndexType() !=
DocValuesSkipIndexType.NONE) {
+ // DocValuesSkipper provides an exact doc count for doc values, so
we can use it
+ // reliably even in the presence of deletions.
+ DocValuesSkipper docValuesSkipper =
reader.getDocValuesSkipper(field);
+ if (docValuesSkipper != null) {
+ int docCount = docValuesSkipper.docCount();
+ if (docCount == reader.maxDoc()) {
+ // Every doc has a value for this field, the count is the
number of live docs.
+ return reader.numDocs();
+ }
+ if (reader.hasDeletions() == false) {
+ return docCount;
+ }
+ }
+ } else if (reader.hasDeletions() == false) {
+ // No deletions: we can use points or terms doc count as a proxy
for doc values.
if (fieldInfo.getPointDimensionCount() > 0) {
PointValues pointValues = reader.getPointValues(field);
- return pointValues == null ? 0 : pointValues.getDocCount();
+ if (pointValues != null) {
+ return pointValues.getDocCount();
+ }
Review Comment:
Yeah... losing the zero case for this and the `Terms` check is not ideal.
##########
lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java:
##########
@@ -234,22 +234,43 @@ public int count(LeafReaderContext context) throws
IOException {
return super.count(context);
} else if (fieldInfo.hasVectorValues()) { // the field indexes vectors
+ int vectorCount = getVectorValuesSize(fieldInfo, reader);
+ if (vectorCount == reader.maxDoc()) {
+ // All docs have vectors, so the count is just the number of live
docs.
+ return reader.numDocs();
+ }
if (reader.hasDeletions() == false) {
- return getVectorValuesSize(fieldInfo, reader);
+ return vectorCount;
}
return super.count(context);
} else if (fieldInfo.getDocValuesType()
!= DocValuesType.NONE) { // the field indexes doc values
- if (reader.hasDeletions() == false) {
+ if (fieldInfo.docValuesSkipIndexType() !=
DocValuesSkipIndexType.NONE) {
+ // DocValuesSkipper provides an exact doc count for doc values, so
we can use it
+ // reliably even in the presence of deletions.
Review Comment:
This comment is a bit confusing.
I interpreted it as "The `DocValuesSkipper` knows how many live docs have
the field", which confused me, since it doesn't have access to the live docs
(since the `DocValuesProducer`'s `SegmentReadState` has `SegmentInfos`, but not
`SegmentCommitInfos`).
Actually reading the code, it's clear that we're saying "If every document
in the segment, live or deleted, has the field, then the count is the number of
live docs."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]