kaivalnp commented on code in PR #16068:
URL: https://github.com/apache/lucene/pull/16068#discussion_r3304441678
##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##########
@@ -197,6 +209,53 @@ public long ramBytesUsed() {
return rawVectorsWriter.ramBytesUsed();
}
+ private static class ByteToFloatVectorValues extends FloatVectorValues {
+ private final ByteVectorValues bytes;
+ private final DocIdSet docIdSet;
+
+ public ByteToFloatVectorValues(ByteVectorValues bytes) {
+ this(bytes, null);
+ }
+
+ public ByteToFloatVectorValues(ByteVectorValues bytes, DocIdSet docIdSet) {
+ this.bytes = bytes;
+ this.docIdSet = docIdSet;
+ }
+
+ @Override
+ public float[] vectorValue(int ord) throws IOException {
+ byte[] b = bytes.vectorValue(ord);
+ float[] f = new float[b.length];
Review Comment:
This creates a `new float[]` for every vector, can we re-use a single array?
##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissLibraryNativeImpl.java:
##########
@@ -277,29 +265,39 @@ private interface FloatToFloatFunction {
private final FloatToFloatFunction scaler;
private boolean closed;
- private Index(MemorySegment indexPointer) {
+ private Index(
+ MemorySegment indexPointer, VectorSimilarityFunction function,
VectorEncoding encoding) {
this.arena = Arena.ofShared();
this.indexPointer =
indexPointer
// Ensure timely cleanup
.reinterpret(arena, wrapper::faiss_Index_free);
- // Get underlying function
- int metricType = wrapper.faiss_Index_metric_type(indexPointer);
- VectorSimilarityFunction function = metricToFunction(metricType);
+ int dimension = wrapper.faiss_Index_d(indexPointer);
// Scale Faiss distances to Lucene scores, see
VectorSimilarityFunction.java
this.scaler =
switch (function) {
- case DOT_PRODUCT ->
- // distance in Faiss === dotProduct in Lucene
- distance -> Math.max((1 + distance) / 2, 0);
+ case DOT_PRODUCT -> {
+ if (encoding == VectorEncoding.BYTE) {
+ float denom = (float) (dimension * (1 << 15));
+ yield distance -> 0.5f + distance / denom;
+ } else {
+ yield distance -> Math.max((1 + distance) / 2, 0);
Review Comment:
Can we re-use [scaling
functions](https://github.com/apache/lucene/blob/366dc3c0396bba2ba634670c1592926c75bcf8ae/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java#L411)
here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]