Re: [Proposal] Remove max number of dimensions for KNN vectors

Alessandro Benedetti Thu, 06 Apr 2023 08:48:54 -0700

>10 MB hard drive, wow I'll never need another floppy disk ever...
Neural nets... nice idea, but there will never be enough CPU power to run
them...


etc.

Is it possible to make it a configurable limit?

I think Gus is on spot, agree 100%.

Vector dimension is already configurable, it's the max dinension which is
hard coded.

Just bear in mind that this MAX limit is not used in initializing data
structures, but only to raise an exception.
As far as I know if we change the limit, if you have small vectors you
won't be impacted at all.

On Thu, 6 Apr 2023, 03:31 Gus Heck, <gus.h...@gmail.com> wrote:

> 10 MB hard drive, wow I'll never need another floppy disk ever...
> Neural nets... nice idea, but there will never be enough CPU power to run
> them...
>
> etc.
>
> Is it possible to make it a configurable limit?
>
> On Wed, Apr 5, 2023 at 4:51 PM Jack Conradson <osjdcon...@gmail.com>
> wrote:
>
>> I don't want to get too far off topic, but I think one of the problems
>> here is that HNSW doesn't really fit well as a Lucene data structure. The
>> way it behaves it would be better supported as a live, in-memory data
>> structure instead of segmented and written to disk for tiny graphs that
>> then need to be merged. I wonder if it may be a better approach to explore
>> other possible algorithms that are designed to be on-disk instead of
>> in-memory even if they require k-means clustering as a trade-off. Maybe
>> with an on-disk algorithm we could have good enough performance for a
>> higher-dimensional limit.
>>
>> On Wed, Apr 5, 2023 at 10:54 AM Robert Muir <rcm...@gmail.com> wrote:
>>
>>> I'd ask anyone voting +1 to raise this limit to at least try to index
>>> a few million vectors with 756 or 1024, which is allowed today.
>>>
>>> IMO based on how painful it is, it seems the limit is already too
>>> high, I realize that will sound controversial but please at least try
>>> it out!
>>>
>>> voting +1 without at least doing this is really the
>>> "weak/unscientifically minded" approach.
>>>
>>> On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner
>>> <michael.wech...@wyona.com> wrote:
>>> >
>>> > Thanks for your feedback!
>>> >
>>> > I agree, that it should not crash.
>>> >
>>> > So far we did not experience crashes ourselves, but we did not index
>>> > millions of vectors.
>>> >
>>> > I will try to reproduce the crash, maybe this will help us to move
>>> forward.
>>> >
>>> > Thanks
>>> >
>>> > Michael
>>> >
>>> > Am 05.04.23 um 18:30 schrieb Dawid Weiss:
>>> > >> Can you describe your crash in more detail?
>>> > > I can't. That experiment was a while ago and a quick test to see if I
>>> > > could index rather large-ish USPTO (patent office) data as vectors.
>>> > > Couldn't do it then.
>>> > >
>>> > >> How much RAM?
>>> > > My indexing jobs run with rather smallish heaps to give space for I/O
>>> > > buffers. Think 4-8GB at most. So yes, it could have been the problem.
>>> > > I recall segment merging grew slower and slower and then simply
>>> > > crashed. Lucene should work with low heap requirements, even if it
>>> > > slows down. Throwing ram at the indexing/ segment merging problem
>>> > > is... I don't know - not elegant?
>>> > >
>>> > > Anyway. My main point was to remind folks about how Apache works -
>>> > > code is merged in when there are no vetoes. If Rob (or anybody else)
>>> > > remains unconvinced, he or she can block the change. (I didn't invent
>>> > > those rules).
>>> > >
>>> > > D.
>>> > >
>>> > > ---------------------------------------------------------------------
>>> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> > >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: [Proposal] Remove max number of dimensions for KNN vectors

Reply via email to