Re: [VOTE] Dimension Limit for KNN Vectors

Michael Wechner Tue, 16 May 2023 05:02:32 -0700

+1 to Gus' reply.

I think that Robert's veto or anyone else's veto is fair enough, but Ialso think that anyone who is vetoing should be very clear about theobjectives / goals to be achieved, in order to get a +1.

If no clear objectives / goals can be defined and agreed on, then thewhole thing becomes arbitrary.

Therefore I would also be interested to know the objectives / goals tobe met that there will be a +1 re this vote?


Thanks

Michael



Am 16.05.23 um 13:45 schrieb Gus Heck:

Robert,

Can you explain in clear technical terms the standard that must be metfor performance? A benchmark that must run in X time on Y hardware forexample (and why that test is suitable)? Or some other reproduciblecriteria? So far I've heard you give an *opinion* that it's unusable,but that's not a technical criteria, others may have a differentconcept of what is usable to them.

Forgive me if I misunderstand, but the essence of your argument hasseemed to be

"Performance isn't good enough, therefore we should force anyone whowants to experiment with something bigger to fork the code base to do it"

Thus, it is necessary to have a clear unambiguous standard that anyonecan verify for "good enough". A clear standard would also focusefforts at improvement.


Where are the goal posts?

FWIW I'm +1 on any of 2-4 since I believe the existence of a hardlimit is fundamentally counterproductive in an open source setting, asit will lead to *fewer people* pushing the limits. Extremely fewpeople are going to get into the nitty-gritty of optimizing thingsunless they are staring at code that they can prove does somethinginteresting, but doesn't run fast enough for their purposes. If peoplehit a hard limit, more of them give up and never develop the code thatwill motivate them to look for optimizations.


-Gus

On Tue, May 16, 2023 at 6:04 AM Robert Muir <rcm...@gmail.com> wrote:

    i still feel -1 (veto) on increasing this limit. sending more
    emails does not change the technical facts or make the veto go away.

    On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti
    <a.benede...@sease.io> wrote:

        Hi all,
        we have finalized all the options proposed by the community
        and we are ready to vote for the preferred one and then
        proceed with the implementation.

        *Option 1*
        Keep it as it is (dimension limit hardcoded to 1024)
        *Motivation*:
        We are close to improving on many fronts. Given the
        criticality of Lucene in computing infrastructure and the
        concerns raised by one of the most active stewards of the
        project, I think we should keep working toward improving the
        feature as is and move to up the limit after we can
        demonstrate improvement unambiguously.

        *Option 2*
        make the limit configurable, for example through a system property
        *Motivation*:
        The system administrator can enforce a limit its users need to
        respect that it's in line with whatever the admin decided to
        be acceptable for them.
        The default can stay the current one.
        This should open the doors for Apache Solr, Elasticsearch,
        OpenSearch, and any sort of plugin development

        *Option 3*
        Move the max dimension limit lower level to a HNSW specific
        implementation. Once there, this limit would not bind any
        other potential vector engine alternative/evolution.*
        *
        *Motivation:*There seem to be contradictory performance
        interpretations about the current HNSW implementation. Some
        consider its performance ok, some not, and it depends on the
        target data set and use case. Increasing the max dimension
        limit where it is currently (in top level FloatVectorValues)
        would not allow potential alternatives (e.g. for other
        use-cases) to be based on a lower limit.

        *Option 4*
        Make it configurable and move it to an appropriate place.
        In particular, a
        simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024)
        should be enough.
        *Motivation*:
        Both are good and not mutually exclusive and could happen in
        any order.
        Someone suggested to perfect what the _default_ limit should
        be, but I've not seen an argument _against_ configurability. 
        Especially in this way -- a toggle that doesn't bind Lucene's
        APIs in any way.

        I'll keep this [VOTE] open for a week and then proceed to the
        implementation.
        --------------------------
        *Alessandro Benedetti*
        Director @ Sease Ltd.
        /Apache Lucene/Solr Committer/
        /Apache Solr PMC Member/

        e-mail: a.benede...@sease.io/
        /

        *Sease* - Information Retrieval Applied
        Consulting | Training | Open Source

        Website: Sease.io <http://sease.io/>
        LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
        <https://twitter.com/seaseltd> | Youtube
        <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> |
        Github <https://github.com/seaseltd>



--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: [VOTE] Dimension Limit for KNN Vectors

Reply via email to