Indeed, the load order can influence Lucene's approximate nearest neighbor search results.
If your two indexes load data sequentially and in the same order, then I believe that you would get the same results. But we consider this an implementation detail rather than a guarantee that Lucene should have. On Thu, Sep 12, 2024 at 7:03 PM Marc Davenport <madavenp...@cargurus.com.invalid> wrote: > Hello, > I've been working on this personalization project using KNN queries and I > have a couple questions but one is more pressing for me than the others. > > 1) Inconsistency between index instances: > All of the same documents are loaded into different indexes. They may be > loaded in different order, but the set of documents will be consistent when > done. I'm finding that when I ask for the 1000 knn documents I sometimes > get inconsistent results between each index. Results are always consistent > from an individual instance. If we assume I haven't made a mistake and > the universe of documents are the same in all instances, can the document > load order have an effect on what is considered the nearest neighbors? > What if I am processing updates to the index at different rates on each > machine, but the end data is all the same? > > Thank you, > Marc > -- Adrien