How many documents do you anticipate in a typical sub range? If it's in the hundreds or even low thousands you would be better off without hnsw. Instead you can use a function score query based on the vector distance. For larger numbers where hnsw becomes useful, you could try using filtered hnsw, but this will be using a single graph constructed from all of the documents.
On Mon, Jun 2, 2025, 5:25 AM Ravikumar Govindarajan < ravikumar.govindara...@gmail.com> wrote: > We use index-sorting to arrange segment data. The ord-ranges for any given > KnnVectorField is mutually exclusive > > Ex: > field: content > > OrdRange -> 0-100 (User1) > OrdRange -> 101-300 (User2) > and so on.. > > Each OrdRange has to be a self-contained Hnsw graph with all neighbours > strictly inside the given OrdRange. A sub-graph, to be precise.. The > generated segment will contain a lot of these sub-graphs but without any > neighbour links to each other at Level-0. Level-1 and above can have > cross-links, which should be fine.. > > Searches will be based on OrdRange and should stop once the sub-graph is > fully explored and not cross over to other sub-graphs.. > > I can index them as different fields but it could run into a few hundreds > (if not thousands). > > Are there any strategies I can adopt to accomplish this? Can a custom > VectorScoringFunction solve this? (Like -> assign actual score, if ords are > in range. Assign 0, if out-of-range etc..) > > Is this the correct way of looking at the problem? > > Any help is much appreciated > > Regards, > Ravi >