I do think there could be many interesting use cases for building multiple graphs from a single set of vectors. For example, one might want to sometimes search all the docs, sometimes search the one subset and other times another subset; baking the constraint into the graph construction would be lead to more efficient searches than the other graph search filtering we can do today (pre- and post-filtering) and there could be use cases where the constraints are so very often present that we would want to pay the up-front cost of computing multiple graphs without paying the cost of storing the same vectors multiple times in the index. This isn't supported today but I think would be a welcome contribution.
On Wed, Jun 4, 2025 at 3:51 AM Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> wrote: > > > > > I wonder if you could influence the graph search by incorporating the > > partition key (customer id?) to the vectors somehow? If this was done > > well it should lead to a natural clustering of the graph. > > > > I can explore further on this. Thanks for the pointers.. > > On Mon, Jun 2, 2025 at 11:14 PM Michael Sokolov <msoko...@gmail.com> wrote: > > > I wonder if you could influence the graph search by incorporating the > > partition key (customer id?) to the vectors somehow? If this was done > > well it should lead to a natural clustering of the graph. > > > > On Mon, Jun 2, 2025 at 11:32 AM Ravikumar Govindarajan > > <ravikumar.govindara...@gmail.com> wrote: > > > > > > Hi Michael, > > > > > > The docs range could vary in extremes from few 10s to tens-of-thousands > > > and in very heavy usage cases, 100k and above… in a single segment > > > > > > Filtered Hnsw like you said uses a single graph.., which could be better > > if > > > designed as sub-graphs > > > > > > On Mon, 2 Jun 2025 at 5:42 PM, Michael Sokolov <msoko...@gmail.com> > > wrote: > > > > > > > How many documents do you anticipate in a typical sub range? If it's > > in the > > > > hundreds or even low thousands you would be better off without hnsw. > > > > Instead you can use a function score query based on the vector > > distance. > > > > For larger numbers where hnsw becomes useful, you could try using > > filtered > > > > hnsw, but this will be using a single graph constructed from all of the > > > > documents. > > > > > > > > On Mon, Jun 2, 2025, 5:25 AM Ravikumar Govindarajan < > > > > ravikumar.govindara...@gmail.com> wrote: > > > > > > > > > We use index-sorting to arrange segment data. The ord-ranges for any > > > > given > > > > > KnnVectorField is mutually exclusive > > > > > > > > > > Ex: > > > > > field: content > > > > > > > > > > OrdRange -> 0-100 (User1) > > > > > OrdRange -> 101-300 (User2) > > > > > and so on.. > > > > > > > > > > Each OrdRange has to be a self-contained Hnsw graph with all > > neighbours > > > > > strictly inside the given OrdRange. A sub-graph, to be precise.. The > > > > > generated segment will contain a lot of these sub-graphs but without > > any > > > > > neighbour links to each other at Level-0. Level-1 and above can have > > > > > cross-links, which should be fine.. > > > > > > > > > > Searches will be based on OrdRange and should stop once the > > sub-graph is > > > > > fully explored and not cross over to other sub-graphs.. > > > > > > > > > > I can index them as different fields but it could run into a few > > hundreds > > > > > (if not thousands). > > > > > > > > > > Are there any strategies I can adopt to accomplish this? Can a custom > > > > > VectorScoringFunction solve this? (Like -> assign actual score, if > > ords > > > > are > > > > > in range. Assign 0, if out-of-range etc..) > > > > > > > > > > Is this the correct way of looking at the problem? > > > > > > > > > > Any help is much appreciated > > > > > > > > > > Regards, > > > > > Ravi > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org