Hi Michael,

The docs range could vary in extremes  from few 10s to tens-of-thousands
and in very heavy usage cases, 100k and above… in a single segment

Filtered Hnsw like you said uses a single graph.., which could be better if
designed as sub-graphs

On Mon, 2 Jun 2025 at 5:42 PM, Michael Sokolov <msoko...@gmail.com> wrote:

> How many documents do you anticipate in a typical sub range? If it's in the
> hundreds or even low thousands you would be better off without hnsw.
> Instead you can use a function score query based on the vector distance.
> For larger numbers where hnsw becomes useful, you could try using filtered
> hnsw, but this will be using a single graph constructed from all of the
> documents.
>
> On Mon, Jun 2, 2025, 5:25 AM Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> > We use index-sorting to arrange segment data. The ord-ranges for any
> given
> > KnnVectorField is mutually exclusive
> >
> > Ex:
> > field: content
> >
> > OrdRange -> 0-100 (User1)
> > OrdRange -> 101-300 (User2)
> > and so on..
> >
> > Each OrdRange has to be a self-contained Hnsw graph with all neighbours
> > strictly inside the given OrdRange. A sub-graph, to be precise.. The
> > generated segment will contain a lot of these sub-graphs but without any
> > neighbour links to each other at Level-0.  Level-1 and above can have
> > cross-links, which should be fine..
> >
> > Searches will be based on OrdRange and should stop once the sub-graph is
> > fully explored and not cross over to other sub-graphs..
> >
> > I can index them as different fields but it could run into a few hundreds
> > (if not thousands).
> >
> > Are there any strategies I can adopt to accomplish this? Can a custom
> > VectorScoringFunction solve this? (Like -> assign actual score, if ords
> are
> > in range. Assign 0, if out-of-range etc..)
> >
> > Is this the correct way of looking at the problem?
> >
> > Any help is much appreciated
> >
> > Regards,
> > Ravi
> >
>

Reply via email to