Hi all,

We happen to be testing on similar things. Based on our experience:

1) For one index that is not changing anymore: issuing the same queries
repeatedly will generate the same results. This is true with concurrent
segment search on. But we are not so sure if this still holds after
https://github.com/apache/lucene/pull/12962.

2) If the load order, or the index segment cut (like docs 1, 2 | 3, 4, 5 vs
1, 2, 3, | 4, 5) is different, the query results can be different.

Thanks!


On Thu, Sep 12, 2024 at 9:27 PM Marc Davenport
<madavenp...@cargurus.com.invalid> wrote:

> Adrien & Micheal.
> Thanks for confirming what I suspected.  I think in the long run I will be
> ok as our users have a sticky session to an instance for some other reasons
> already.
>
> Marc
>
> On Thu, Sep 12, 2024 at 6:03 PM Michael Sokolov <msoko...@gmail.com>
> wrote:
>
> > > If your two indexes load data sequentially and in the same order, then
> I
> > believe that you would get the same results. But we consider this an
> > implementation detail rather than a guarantee that Lucene should have.
> >
> > You might even still be surprised by nondeterminism arising from
> > concurrency during merging, which should be the default in recent
> > versions.
> >
> > On Thu, Sep 12, 2024 at 4:53 PM Adrien Grand <jpou...@gmail.com> wrote:
> > >
> > > Indeed, the load order can influence Lucene's approximate nearest
> > neighbor
> > > search results.
> > >
> > > If your two indexes load data sequentially and in the same order, then
> I
> > > believe that you would get the same results. But we consider this an
> > > implementation detail rather than a guarantee that Lucene should have.
> > >
> > > On Thu, Sep 12, 2024 at 7:03 PM Marc Davenport
> > > <madavenp...@cargurus.com.invalid> wrote:
> > >
> > > > Hello,
> > > > I've been working on this personalization project using KNN queries
> > and I
> > > > have a couple questions but one is more pressing for me than the
> > others.
> > > >
> > > > 1) Inconsistency between index instances:
> > > > All of the same documents are loaded into different indexes. They may
> > be
> > > > loaded in different order, but the set of documents will be
> consistent
> > when
> > > > done.  I'm finding that when I ask for the 1000 knn documents I
> > sometimes
> > > > get inconsistent results between each index.  Results are always
> > consistent
> > > > from an individual instance.   If we assume I haven't made a mistake
> > and
> > > > the universe of documents are the same in all instances, can the
> > document
> > > > load order have an effect on what is considered the nearest
> neighbors?
> > > >  What if I am processing updates to the index at different rates on
> > each
> > > > machine, but the end data is all the same?
> > > >
> > > > Thank you,
> > > > Marc
> > > >
> > >
> > >
> > > --
> > > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>

Reply via email to