0ctopus13prime opened a new issue, #14992: URL: https://github.com/apache/lucene/issues/14992
### Description [MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404) represents a unified view of byte vector values from multiple underlying Lucene segments. It provides an iterator interface that allows advancing to a specific document ID. In this contract, if a user advances five times using the KnnVectorValues iterator and then loads the next byte[], it is expected to return the corresponding next vector. However, this expected behavior does not hold for [MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404). After advancing N times and attempting to load the next byte[], an error consistently occurs. This inconsistency does not exist in [MergedFloat32VectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L306), where the behavior is correct and reliable. The root cause is that [MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404) does not update the internal lastOrd field when advancing, unlike [MergedFloat32VectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L306). As a result, when attempting to load the next vector, the code checks the current ord against lastOrd, which remains zero, causing an exception to be thrown. https://github.com/apache/lucene/blob/7fc9fd36095b781cee818f974317e0faf06f7fc0/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L450-L461 To fix this, [MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404) should be updated to increment lastOrd during advancement, mirroring the behavior of [MergedFloat32VectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L306). https://github.com/apache/lucene/blob/7fc9fd36095b781cee818f974317e0faf06f7fc0/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L340-L352 In OpenSearch, for fast vector index construction, we upload KNN vectors to a remote builder component and trigger the index build. To speed up uploads, vectors are logically partitioned, and multipart uploading is performed — a process that relies on the advance-then-load pattern. Due to lastOrd not being updated correctly in [MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404), this multipart upload mechanism currently fails, blocking fast uploads with byte vectors. For more details use case in Opensearch, please refer to this : https://github.com/opensearch-project/k-NN/issues/2803 ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org