+1 to Gus' reply.

I think that Robert's veto or anyone else's veto is fair enough, but I also think that anyone who is vetoing should be very clear about the objectives / goals to be achieved, in order to get a +1.

If no clear objectives / goals can be defined and agreed on, then the whole thing becomes arbitrary.

Therefore I would also be interested to know the objectives / goals to be met that there will be a +1 re this vote?

Thanks

Michael



Am 16.05.23 um 13:45 schrieb Gus Heck:
Robert,

Can you explain in clear technical terms the standard that must be met for performance? A benchmark that must run in X time on Y hardware for example (and why that test is suitable)? Or some other reproducible criteria? So far I've heard you give an *opinion* that it's unusable, but that's not a technical criteria, others may have a different concept of what is usable to them.

Forgive me if I misunderstand, but the essence of your argument has seemed to be

"Performance isn't good enough, therefore we should force anyone who wants to experiment with something bigger to fork the code base to do it"

Thus, it is necessary to have a clear unambiguous standard that anyone can verify for "good enough". A clear standard would also focus efforts at improvement.

Where are the goal posts?

FWIW I'm +1 on any of 2-4 since I believe the existence of a hard limit is fundamentally counterproductive in an open source setting, as it will lead to *fewer people* pushing the limits. Extremely few people are going to get into the nitty-gritty of optimizing things unless they are staring at code that they can prove does something interesting, but doesn't run fast enough for their purposes. If people hit a hard limit, more of them give up and never develop the code that will motivate them to look for optimizations.

-Gus

On Tue, May 16, 2023 at 6:04 AM Robert Muir <rcm...@gmail.com> wrote:

    i still feel -1 (veto) on increasing this limit. sending more
    emails does not change the technical facts or make the veto go away.

    On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti
    <a.benede...@sease.io> wrote:

        Hi all,
        we have finalized all the options proposed by the community
        and we are ready to vote for the preferred one and then
        proceed with the implementation.

        *Option 1*
        Keep it as it is (dimension limit hardcoded to 1024)
        *Motivation*:
        We are close to improving on many fronts. Given the
        criticality of Lucene in computing infrastructure and the
        concerns raised by one of the most active stewards of the
        project, I think we should keep working toward improving the
        feature as is and move to up the limit after we can
        demonstrate improvement unambiguously.

        *Option 2*
        make the limit configurable, for example through a system property
        *Motivation*:
        The system administrator can enforce a limit its users need to
        respect that it's in line with whatever the admin decided to
        be acceptable for them.
        The default can stay the current one.
        This should open the doors for Apache Solr, Elasticsearch,
        OpenSearch, and any sort of plugin development

        *Option 3*
        Move the max dimension limit lower level to a HNSW specific
        implementation. Once there, this limit would not bind any
        other potential vector engine alternative/evolution.*
        *
        *Motivation:*There seem to be contradictory performance
        interpretations about the current HNSW implementation. Some
        consider its performance ok, some not, and it depends on the
        target data set and use case. Increasing the max dimension
        limit where it is currently (in top level FloatVectorValues)
        would not allow potential alternatives (e.g. for other
        use-cases) to be based on a lower limit.

        *Option 4*
        Make it configurable and move it to an appropriate place.
        In particular, a
        simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024)
        should be enough.
        *Motivation*:
        Both are good and not mutually exclusive and could happen in
        any order.
        Someone suggested to perfect what the _default_ limit should
        be, but I've not seen an argument _against_ configurability. 
        Especially in this way -- a toggle that doesn't bind Lucene's
        APIs in any way.

        I'll keep this [VOTE] open for a week and then proceed to the
        implementation.
        --------------------------
        *Alessandro Benedetti*
        Director @ Sease Ltd.
        /Apache Lucene/Solr Committer/
        /Apache Solr PMC Member/

        e-mail: a.benede...@sease.io/
        /

        *Sease* - Information Retrieval Applied
        Consulting | Training | Open Source

        Website: Sease.io <http://sease.io/>
        LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
        <https://twitter.com/seaseltd> | Youtube
        <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> |
        Github <https://github.com/seaseltd>



--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to