hossman opened a new issue, #13105: URL: https://github.com/apache/lucene/issues/13105
### Description Like most field types in lucene, `Knn(Byte|Float)VectorField` classes do not require a field value for every document -- they support use cases where some documents are indexed w/o a field value. The `(Byte|Float)KnnVectorFieldSource` implementations _generally_ follow the pattern of most field based `ValueSource` impls in terms of how they behave when some documents do not have a value in the specified field (`TestKnnVectorSimilarityFunctions` even checks for this in various tests using the `knnFloatField3` and `knnByteField3` fields which only exist in one document). **_But_** these value sources break if/when a user gets unlucky and a segment exists where _none_ of the documents have value in the vector field (and by extension: when querying an empty index) This is due to the following check... ``` final ByteVectorValues vectorValues = readerContext.reader().getByteVectorValues(fieldName); if (vectorValues == null) { throw new IllegalArgumentException( "no byte vector value is indexed for field '" + fieldName + "'"); ... ``` This problem is trivial to reproduce reliably by adding an `iw.commit();` call to `TestKnnVectorSimilarityFunctions` between indexing the two documents. (The `RandomIndexWriter` should eventually trigger it in this test even w/o modifications .. not sure if it ever has in jenkins?) ---- The appropriate way to implement this type of check is to follow the pattern in the `DocValues.getFoo()` methods, that use a `checkField(...)` method to inspect the `FieldInfos` from the `LeafReader` for consistency w/expectations if `reader.getFooDocValues()` returns null. `checkField(...)` assumes everything is fine if no `FieldInfo` exist for this field in this reader -- using an "empty" impl in it's place. ---- I'm attaching a patch that fixes these `ValueSource` classes, by returning a `VectorFieldFunction` that wraps `DocIdSetIterator.empty()` when `vectorValues` is null, as long as a new `VectorEncoding` method `checkField(reader, fieldName)` does not throw an exception (Adding `checkField` directly to `VectorEncoding` seemed like the most straightforward place to put this ... not sure if there is a more appropriate class?) I also made the new `checkField` method throw `IllegalStateException` instead of `IllegalArgumentException` to be consistent with other "is the index field type consistent with the type of query you are trying to do?" checks in the code base (like `DocValues.checkField(...)` and updated `TestKnnVectorSimilarityFunctions` accordingly. Patch: [VectorFieldSource.fix-missing-check.patch.txt](https://github.com/apache/lucene/files/14284577/VectorFieldSource.fix-missing-check.patch.txt) ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org