Hi Michael,

The max vector dimension limit is no longer checked in the field type as it is responsibility of the codec to enforce it.

You need to build your own codec that returns a different setting so it can be enforced by IndexWriter. See Apache Solr's code how to wrap the existing KnnVectorsFormat so it returns another limit: <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183>

Basically you need to subclass Lucene95Codec like done here: <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146> and return a different vectors format like a delegator as descirbed before.

The responsibility was shifted to the codec, because there may be better alternatives to HNSW that have different limits especially with regard to performance during merging and query response times, e.g. BKD trees.

Uwe

Am 19.10.2023 um 10:53 schrieb Michael Wechner:
I forgot to mention, that when using the custom FieldType and 1536 vector dimension does work with Lucene 9.7.0

Thanks

Michael



Am 19.10.23 um 10:39 schrieb Michael Wechner:
Hi

I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's embedding model, which has the vector dimension 1536 and received the following error

Field[vector]vector's dimensions must be <= [1024]; got 1536

wheres this worked previously with the hack to override the vector dimension using a custom

float[] vector = ...
FieldType vectorFieldType = new CustomVectorFieldType(vector.length, VectorSimilarityFuncion.COSINE);

and setting

KnnFloatVectorField vectorField = new KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);

But this does not seem to work anymore with Lucene 9.8.0

Is this hack now prevented by the Lucene code itself, or any idea how to make this work again?

Whatever one thinks of OpenAI, the embedding model "text-embedding-ada-002" is really good and it is sad, that one cannot use it with Lucene, because of the 1024 dimension restriction.

Thanks

Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to