0ctopus13prime opened a new issue, #14992:
URL: https://github.com/apache/lucene/issues/14992

   ### Description
   
   
[MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404)
 represents a unified view of byte vector values from multiple underlying 
Lucene segments. It provides an iterator interface that allows advancing to a 
specific document ID. In this contract, if a user advances five times using the 
KnnVectorValues iterator and then loads the next byte[], it is expected to 
return the corresponding next vector.
   
   However, this expected behavior does not hold for 
[MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404).
 After advancing N times and attempting to load the next byte[], an error 
consistently occurs. This inconsistency does not exist in 
[MergedFloat32VectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L306),
 where the behavior is correct and reliable.
   
   The root cause is that 
[MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404)
 does not update the internal lastOrd field when advancing, unlike 
[MergedFloat32VectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L306).
 As a result, when attempting to load the next vector, the code checks the 
current ord against lastOrd, which remains zero, causing an exception to be 
thrown.
   
   
https://github.com/apache/lucene/blob/7fc9fd36095b781cee818f974317e0faf06f7fc0/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L450-L461
   
   To fix this, 
[MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404)
 should be updated to increment lastOrd during advancement, mirroring the 
behavior of 
[MergedFloat32VectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L306).
   
   
https://github.com/apache/lucene/blob/7fc9fd36095b781cee818f974317e0faf06f7fc0/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L340-L352
   
   In OpenSearch, for fast vector index construction, we upload KNN vectors to 
a remote builder component and trigger the index build. To speed up uploads, 
vectors are logically partitioned, and multipart uploading is performed — a 
process that relies on the advance-then-load pattern. Due to lastOrd not being 
updated correctly in 
[MergedByteVectorValues](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java#L404),
 this multipart upload mechanism currently fails, blocking fast uploads with 
byte vectors.
   
   For more details use case in Opensearch, please refer to this : 
https://github.com/opensearch-project/k-NN/issues/2803
   
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to