Hi all, We happen to be testing on similar things. Based on our experience:
1) For one index that is not changing anymore: issuing the same queries repeatedly will generate the same results. This is true with concurrent segment search on. But we are not so sure if this still holds after https://github.com/apache/lucene/pull/12962. 2) If the load order, or the index segment cut (like docs 1, 2 | 3, 4, 5 vs 1, 2, 3, | 4, 5) is different, the query results can be different. Thanks! On Thu, Sep 12, 2024 at 9:27 PM Marc Davenport <madavenp...@cargurus.com.invalid> wrote: > Adrien & Micheal. > Thanks for confirming what I suspected. I think in the long run I will be > ok as our users have a sticky session to an instance for some other reasons > already. > > Marc > > On Thu, Sep 12, 2024 at 6:03 PM Michael Sokolov <msoko...@gmail.com> > wrote: > > > > If your two indexes load data sequentially and in the same order, then > I > > believe that you would get the same results. But we consider this an > > implementation detail rather than a guarantee that Lucene should have. > > > > You might even still be surprised by nondeterminism arising from > > concurrency during merging, which should be the default in recent > > versions. > > > > On Thu, Sep 12, 2024 at 4:53 PM Adrien Grand <jpou...@gmail.com> wrote: > > > > > > Indeed, the load order can influence Lucene's approximate nearest > > neighbor > > > search results. > > > > > > If your two indexes load data sequentially and in the same order, then > I > > > believe that you would get the same results. But we consider this an > > > implementation detail rather than a guarantee that Lucene should have. > > > > > > On Thu, Sep 12, 2024 at 7:03 PM Marc Davenport > > > <madavenp...@cargurus.com.invalid> wrote: > > > > > > > Hello, > > > > I've been working on this personalization project using KNN queries > > and I > > > > have a couple questions but one is more pressing for me than the > > others. > > > > > > > > 1) Inconsistency between index instances: > > > > All of the same documents are loaded into different indexes. They may > > be > > > > loaded in different order, but the set of documents will be > consistent > > when > > > > done. I'm finding that when I ask for the 1000 knn documents I > > sometimes > > > > get inconsistent results between each index. Results are always > > consistent > > > > from an individual instance. If we assume I haven't made a mistake > > and > > > > the universe of documents are the same in all instances, can the > > document > > > > load order have an effect on what is considered the nearest > neighbors? > > > > What if I am processing updates to the index at different rates on > > each > > > > machine, but the end data is all the same? > > > > > > > > Thank you, > > > > Marc > > > > > > > > > > > > > -- > > > Adrien > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >