How many documents do you anticipate in a typical sub range? If it's in the
hundreds or even low thousands you would be better off without hnsw.
Instead you can use a function score query based on the vector distance.
For larger numbers where hnsw becomes useful, you could try using filtered
hnsw, but this will be using a single graph constructed from all of the
documents.

On Mon, Jun 2, 2025, 5:25 AM Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:

> We use index-sorting to arrange segment data. The ord-ranges for any given
> KnnVectorField is mutually exclusive
>
> Ex:
> field: content
>
> OrdRange -> 0-100 (User1)
> OrdRange -> 101-300 (User2)
> and so on..
>
> Each OrdRange has to be a self-contained Hnsw graph with all neighbours
> strictly inside the given OrdRange. A sub-graph, to be precise.. The
> generated segment will contain a lot of these sub-graphs but without any
> neighbour links to each other at Level-0.  Level-1 and above can have
> cross-links, which should be fine..
>
> Searches will be based on OrdRange and should stop once the sub-graph is
> fully explored and not cross over to other sub-graphs..
>
> I can index them as different fields but it could run into a few hundreds
> (if not thousands).
>
> Are there any strategies I can adopt to accomplish this? Can a custom
> VectorScoringFunction solve this? (Like -> assign actual score, if ords are
> in range. Assign 0, if out-of-range etc..)
>
> Is this the correct way of looking at the problem?
>
> Any help is much appreciated
>
> Regards,
> Ravi
>

Reply via email to