true :-) when you are the one controlling the input of vectors, then a method to disable the maximum limit would be sufficient.

But I could imagine when you offer Lucene as a service where people can for example configure their own "sentence embedding models" and you would like to offer a different maximum limit than the default of 1024, then I think a method to reset the maximum limit would make sense. Examples could be a service of OpenAI or vector search databases like for example Weaviate or Pinecone.

Thanks

Michael




Am 15.02.22 um 23:34 schrieb Michael Sokolov:
I don't think it makes sense to have a static variable maximum that you can change by calling a method. What purpose would it serve?

On Tue, Feb 15, 2022, 2:39 PM Michael Wechner <michael.wech...@wyona.com> wrote:

    Hi Alessandro

    No, I have not created a Jira ticket, but I would be happy to
    create one, just let me know or please feel free to create one.

    I understand the concerns about the limits in general and I think
    it makes sense to have a default max dimensions limit, but I could
    imagine it needs to be increased eventually and being able to
    increase it programmatically and at your own risk will help people
    using Lucene.

    Thanks

    Michael

    Am 15.02.22 um 19:22 schrieb Alessandro Benedetti:
    Hi Michael,
    let's create a Jira ticket to use a higher value(if you haven't
    already).
    I would be happy to consider the patch/or do it myself but after
    10/03.
    Once the pull request is ready (including the Javadoc
    documentation that clearly states that if you go above X it's at
    your own risk), we'll involve also Michael Sokolov and the other
    committers familiar with this area of the code.

    Cheers

    --------------------------
    Alessandro Benedetti
    Apache Lucene/Solr PMC member and Committer
    Director, R&D Software Engineer, Search Consultant

    www.sease.io <http://www.sease.io>


    On Sat, 12 Feb 2022 at 22:53, Michael Wechner
    <michael.wech...@wyona.com> wrote:

        Hi

        I just tried to test the OpenAI model
        "text-similarity-davinci-001" with 12288 dimensions and
        receive the following error

        java.lang.IllegalArgumentException: vector numDimensions must
        be <= VectorValues.MAX_DIMENSIONS (=1024); got 12288
                at
        
org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
        ~[lucene-core-9.0.0.jar:9.0.0
        0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
        2021-12-01 14:23:49]
                at
        
org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
        ~[lucene-core-9.0.0.jar:9.0.0
        0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
        2021-12-01 14:23:49]

        IIUC I can not increase programmatically the max vector size
        which is set inside
        lucene/core/src/java/org/apache/lucene/index/VectorValues.java

          public static int MAX_DIMENSIONS = 1024;

        right?

        I guess I could rebuild Lucene with a greater size or what
        are the possbilities to increase the max vector size?

        Thanks

        Michael



Reply via email to