+1 to Gus' reply.
I think that Robert's veto or anyone else's veto is fair enough, but I
also think that anyone who is vetoing should be very clear about the
objectives / goals to be achieved, in order to get a +1.
If no clear objectives / goals can be defined and agreed on, then the
whole thing becomes arbitrary.
Therefore I would also be interested to know the objectives / goals to
be met that there will be a +1 re this vote?
Thanks
Michael
Am 16.05.23 um 13:45 schrieb Gus Heck:
Robert,
Can you explain in clear technical terms the standard that must be met
for performance? A benchmark that must run in X time on Y hardware for
example (and why that test is suitable)? Or some other reproducible
criteria? So far I've heard you give an *opinion* that it's unusable,
but that's not a technical criteria, others may have a different
concept of what is usable to them.
Forgive me if I misunderstand, but the essence of your argument has
seemed to be
"Performance isn't good enough, therefore we should force anyone who
wants to experiment with something bigger to fork the code base to do it"
Thus, it is necessary to have a clear unambiguous standard that anyone
can verify for "good enough". A clear standard would also focus
efforts at improvement.
Where are the goal posts?
FWIW I'm +1 on any of 2-4 since I believe the existence of a hard
limit is fundamentally counterproductive in an open source setting, as
it will lead to *fewer people* pushing the limits. Extremely few
people are going to get into the nitty-gritty of optimizing things
unless they are staring at code that they can prove does something
interesting, but doesn't run fast enough for their purposes. If people
hit a hard limit, more of them give up and never develop the code that
will motivate them to look for optimizations.
-Gus
On Tue, May 16, 2023 at 6:04 AM Robert Muir <rcm...@gmail.com> wrote:
i still feel -1 (veto) on increasing this limit. sending more
emails does not change the technical facts or make the veto go away.
On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti
<a.benede...@sease.io> wrote:
Hi all,
we have finalized all the options proposed by the community
and we are ready to vote for the preferred one and then
proceed with the implementation.
*Option 1*
Keep it as it is (dimension limit hardcoded to 1024)
*Motivation*:
We are close to improving on many fronts. Given the
criticality of Lucene in computing infrastructure and the
concerns raised by one of the most active stewards of the
project, I think we should keep working toward improving the
feature as is and move to up the limit after we can
demonstrate improvement unambiguously.
*Option 2*
make the limit configurable, for example through a system property
*Motivation*:
The system administrator can enforce a limit its users need to
respect that it's in line with whatever the admin decided to
be acceptable for them.
The default can stay the current one.
This should open the doors for Apache Solr, Elasticsearch,
OpenSearch, and any sort of plugin development
*Option 3*
Move the max dimension limit lower level to a HNSW specific
implementation. Once there, this limit would not bind any
other potential vector engine alternative/evolution.*
*
*Motivation:*There seem to be contradictory performance
interpretations about the current HNSW implementation. Some
consider its performance ok, some not, and it depends on the
target data set and use case. Increasing the max dimension
limit where it is currently (in top level FloatVectorValues)
would not allow potential alternatives (e.g. for other
use-cases) to be based on a lower limit.
*Option 4*
Make it configurable and move it to an appropriate place.
In particular, a
simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024)
should be enough.
*Motivation*:
Both are good and not mutually exclusive and could happen in
any order.
Someone suggested to perfect what the _default_ limit should
be, but I've not seen an argument _against_ configurability.
Especially in this way -- a toggle that doesn't bind Lucene's
APIs in any way.
I'll keep this [VOTE] open for a week and then proceed to the
implementation.
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/
e-mail: a.benede...@sease.io/
/
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> |
Github <https://github.com/seaseltd>
--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)