[GitHub] [lucene] jtibshirani commented on a change in pull request #601: LUCENE-10375: Write merged vectors to file before building graph

GitBox Wed, 12 Jan 2022 22:16:34 -0800


jtibshirani commented on a change in pull request #601:
URL: https://github.com/apache/lucene/pull/601#discussion_r783657144




##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsWriter.java
##########
@@ -145,6 +139,64 @@ public void writeField(FieldInfo fieldInfo, 
KnnVectorsReader knnVectorsReader)
       throw new IllegalArgumentException(
           "Indexing an HNSW graph requires a random access vector values, got 
" + vectors);
     }
+
+    long vectorDataLength = vectorData.getFilePointer() - vectorDataOffset;
+    long vectorIndexLength = vectorIndex.getFilePointer() - vectorIndexOffset;
+    writeMeta(
+        fieldInfo,
+        vectorDataOffset,
+        vectorDataLength,
+        vectorIndexOffset,
+        vectorIndexLength,
+        count,
+        docIds);
+    writeGraphOffsets(meta, offsets);
+  }
+
+  @Override
+  public void mergeField(FieldInfo fieldInfo, MergeState mergeState) throws 
IOException {
+    if (mergeState.infoStream.isEnabled("VV")) {
+      mergeState.infoStream.message("VV", "merging " + mergeState.segmentInfo);
+    }
+
+    writeVectorDataPadding();
+    long vectorDataOffset = vectorData.getFilePointer();
+
+    // write the merged vector data to a temporary file
+    VectorValues vectors = MergedVectorValues.mergeVectorValues(fieldInfo, 
mergeState);
+    IndexOutput tempVectorData =

Review comment:
       It felt funny to be using a temp file here, since we write out the same 
data to the real vector data file. I had some trouble seeing how we'd open it 
for reading in the middle of this merge... I need to look into it more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] jtibshirani commented on a change in pull request #601: LUCENE-10375: Write merged vectors to file before building graph

Reply via email to