Re: [PR] Normalize written scalar quantized vectors when using cosine similarity [lucene]

via GitHub Wed, 08 Nov 2023 04:59:31 -0800


benwtrent commented on code in PR #12780:
URL: https://github.com/apache/lucene/pull/12780#discussion_r1386577696



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##########
@@ -143,6 +143,13 @@ private void 
writeQuantizedVectors(QuantizationFieldVectorWriter fieldData) thro
     byte[] vector = new byte[fieldData.dim];
     final ByteBuffer offsetBuffer = 
ByteBuffer.allocate(Float.BYTES).order(ByteOrder.LITTLE_ENDIAN);
     for (float[] v : fieldData.floatVectors) {
+      if (fieldData.normalize) {
+        float[] copy = new float[v.length];

Review Comment:
   we should only have one `copy` array created. Outside the loop check if we 
need it, if so, initialize and then you can use it here in the loop. Seems 
better to do that than create a new `float[dim]` for every vector.



##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java:
##########
@@ -81,7 +81,8 @@ public void init() {
   protected void addRandomFields(Document doc) {
     switch (vectorEncoding) {
       case BYTE -> doc.add(new KnnByteVectorField("v2", randomVector8(30), 
similarityFunction));
-      case FLOAT32 -> doc.add(new KnnFloatVectorField("v2", randomVector(30), 
similarityFunction));
+      case FLOAT32 -> doc.add(
+          new KnnFloatVectorField("v2", randomNormalizedVector(30), 
similarityFunction));

Review Comment:
   Were you getting many failures with your change when you put non-normalized 
vectors? The fix to normalize on cosine should be transparent. The caller 
should never be aware that it happens as it only occurs for things that are 
then quantized.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##########
@@ -196,6 +203,13 @@ void 
writeSortedQuantizedVectors(QuantizationFieldVectorWriter fieldData, int[]
     final ByteBuffer offsetBuffer = 
ByteBuffer.allocate(Float.BYTES).order(ByteOrder.LITTLE_ENDIAN);
     for (int ordinal : ordMap) {
       float[] v = fieldData.floatVectors.get(ordinal);
+      if (fieldData.normalize) {
+        float[] copy = new float[v.length];

Review Comment:
   Similar to above ^



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Normalize written scalar quantized vectors when using cosine similarity [lucene]

Reply via email to