Hi Alessandro
Thank you very much for summarizing and starting the vote.
I am not sure whether I really understand the difference between Option
2 and Option 4, or is it just about implementation details?
Thanks
Michael
Am 16.05.23 um 10:50 schrieb Alessandro Benedetti:
Hi all,
we have finalized all the options proposed by the community and we are
ready to vote for the preferred one and then proceed with the
implementation.
*Option 1*
Keep it as it is (dimension limit hardcoded to 1024)
*Motivation*:
We are close to improving on many fronts. Given the criticality of
Lucene in computing infrastructure and the concerns raised by one of
the most active stewards of the project, I think we should keep
working toward improving the feature as is and move to up the limit
after we can demonstrate improvement unambiguously.
*Option 2*
make the limit configurable, for example through a system property
*Motivation*:
The system administrator can enforce a limit its users need to respect
that it's in line with whatever the admin decided to be acceptable for
them.
The default can stay the current one.
This should open the doors for Apache Solr, Elasticsearch, OpenSearch,
and any sort of plugin development
*Option 3*
Move the max dimension limit lower level to a HNSW specific
implementation. Once there, this limit would not bind any other
potential vector engine alternative/evolution.*
*
*Motivation:*There seem to be contradictory performance
interpretations about the current HNSW implementation. Some consider
its performance ok, some not, and it depends on the target data set
and use case. Increasing the max dimension limit where it is currently
(in top level FloatVectorValues) would not allow
potential alternatives (e.g. for other use-cases) to be based on a
lower limit.
*Option 4*
Make it configurable and move it to an appropriate place.
In particular, a
simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024) should be
enough.
*Motivation*:
Both are good and not mutually exclusive and could happen in any order.
Someone suggested to perfect what the _default_ limit should be, but
I've not seen an argument _against_ configurability. Especially in
this way -- a toggle that doesn't bind Lucene's APIs in any way.
I'll keep this [VOTE] open for a week and then proceed to the
implementation.
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/
e-mail: a.benede...@sease.io/
/
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>