benwtrent commented on code in PR #12780:
URL: https://github.com/apache/lucene/pull/12780#discussion_r1386577696
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##########
@@ -143,6 +143,13 @@ private void
writeQuantizedVectors(QuantizationFieldVectorWriter fieldData) thro
byte[] vector = new byte[fieldData.dim];
final ByteBuffer offsetBuffer =
ByteBuffer.allocate(Float.BYTES).order(ByteOrder.LITTLE_ENDIAN);
for (float[] v : fieldData.floatVectors) {
+ if (fieldData.normalize) {
+ float[] copy = new float[v.length];
Review Comment:
we should only have one `copy` array created. Outside the loop check if we
need it, if so, initialize and then you can use it here in the loop. Seems
better to do that than create a new `float[dim]` for every vector.
##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java:
##########
@@ -81,7 +81,8 @@ public void init() {
protected void addRandomFields(Document doc) {
switch (vectorEncoding) {
case BYTE -> doc.add(new KnnByteVectorField("v2", randomVector8(30),
similarityFunction));
- case FLOAT32 -> doc.add(new KnnFloatVectorField("v2", randomVector(30),
similarityFunction));
+ case FLOAT32 -> doc.add(
+ new KnnFloatVectorField("v2", randomNormalizedVector(30),
similarityFunction));
Review Comment:
Were you getting many failures with your change when you put non-normalized
vectors? The fix to normalize on cosine should be transparent. The caller
should never be aware that it happens as it only occurs for things that are
then quantized.
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##########
@@ -196,6 +203,13 @@ void
writeSortedQuantizedVectors(QuantizationFieldVectorWriter fieldData, int[]
final ByteBuffer offsetBuffer =
ByteBuffer.allocate(Float.BYTES).order(ByteOrder.LITTLE_ENDIAN);
for (int ordinal : ordMap) {
float[] v = fieldData.floatVectors.get(ordinal);
+ if (fieldData.normalize) {
+ float[] copy = new float[v.length];
Review Comment:
Similar to above ^
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]