Re: Sub-Graphs in Hnsw

Ravikumar Govindarajan Wed, 04 Jun 2025 00:51:56 -0700

>
> I wonder if you could influence the graph search by incorporating the
> partition key (customer id?) to the vectors somehow? If this was done
> well it should lead to a natural clustering of the graph.
>


I can explore further on this. Thanks for the pointers..

On Mon, Jun 2, 2025 at 11:14 PM Michael Sokolov <[email protected]> wrote:

> I wonder if you could influence the graph search by incorporating the
> partition key (customer id?) to the vectors somehow? If this was done
> well it should lead to a natural clustering of the graph.
>
> On Mon, Jun 2, 2025 at 11:32 AM Ravikumar Govindarajan
> <[email protected]> wrote:
> >
> > Hi Michael,
> >
> > The docs range could vary in extremes  from few 10s to tens-of-thousands
> > and in very heavy usage cases, 100k and above… in a single segment
> >
> > Filtered Hnsw like you said uses a single graph.., which could be better
> if
> > designed as sub-graphs
> >
> > On Mon, 2 Jun 2025 at 5:42 PM, Michael Sokolov <[email protected]>
> wrote:
> >
> > > How many documents do you anticipate in a typical sub range? If it's
> in the
> > > hundreds or even low thousands you would be better off without hnsw.
> > > Instead you can use a function score query based on the vector
> distance.
> > > For larger numbers where hnsw becomes useful, you could try using
> filtered
> > > hnsw, but this will be using a single graph constructed from all of the
> > > documents.
> > >
> > > On Mon, Jun 2, 2025, 5:25 AM Ravikumar Govindarajan <
> > > [email protected]> wrote:
> > >
> > > > We use index-sorting to arrange segment data. The ord-ranges for any
> > > given
> > > > KnnVectorField is mutually exclusive
> > > >
> > > > Ex:
> > > > field: content
> > > >
> > > > OrdRange -> 0-100 (User1)
> > > > OrdRange -> 101-300 (User2)
> > > > and so on..
> > > >
> > > > Each OrdRange has to be a self-contained Hnsw graph with all
> neighbours
> > > > strictly inside the given OrdRange. A sub-graph, to be precise.. The
> > > > generated segment will contain a lot of these sub-graphs but without
> any
> > > > neighbour links to each other at Level-0.  Level-1 and above can have
> > > > cross-links, which should be fine..
> > > >
> > > > Searches will be based on OrdRange and should stop once the
> sub-graph is
> > > > fully explored and not cross over to other sub-graphs..
> > > >
> > > > I can index them as different fields but it could run into a few
> hundreds
> > > > (if not thousands).
> > > >
> > > > Are there any strategies I can adopt to accomplish this? Can a custom
> > > > VectorScoringFunction solve this? (Like -> assign actual score, if
> ords
> > > are
> > > > in range. Assign 0, if out-of-range etc..)
> > > >
> > > > Is this the correct way of looking at the problem?
> > > >
> > > > Any help is much appreciated
> > > >
> > > > Regards,
> > > > Ravi
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Sub-Graphs in Hnsw

Reply via email to