Thanks, David Smiley and Gus Heck for this wonderful insight.

On Thu, Mar 16, 2023 at 3:46 AM Gus Heck <gus.h...@gmail.com> wrote:

> I think I recall past experience that if the ID is duplicated, you get one
> or the other, and the one you get is non-deterministic, but as this is an
> unsupported and untested configuration, I would expect other things like
> facet counts etc to be thrown off. Also if the schemas use different fields
> for identity or the collections assign different id's to the same document
> then of course you likely get both showing up in the same results. That
> said this may have changed, and maybe now it's possible to get two with the
> same ID back, or it has become deterministic in some way. AFAIK It's not a
> supported use case so anything could have changed.
>
> In short, you probably should not alias two collections containing the same
> data into a single alias. Aliasing two collections with identical schema
> and **different** data is the expected use case for aliases that point to
> more than one collection. Schemas could be slightly different too, but
> results involving the non-matching fields will become hard to predict.
>
> As a practical example of this, in Time Routed Aliases (TRA's) it's
> important never to send the same document with changes to the value of the
> routed field as that will create two time slices (collections) with a
> document that has the same ID (see the very first warning here:
>
> https://solr.apache.org/guide/solr/latest/deployment-guide/aliases.html#routed-aliases
> )
>
> On Wed, Mar 15, 2023 at 5:02 PM David Smiley <dsmi...@apache.org> wrote:
>
> > When aliasing across collections, it's up to you/the-user to ensure that
> > they don't contain the same document (by ID).  I don't believe this is
> > supported at all. If you find information to the contrary, let us know.
> I
> > could imagine some small code details to _do something_ if it could be
> > detected in some cases but that isn't a substitute for truly
> > working/supported.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Tue, Mar 7, 2023 at 5:34 AM Vinayak Hegde <vinayakph...@gmail.com>
> > wrote:
> >
> > > Hello everyone,
> > > I hope this email finds you well. I am reaching out to discuss a
> strange
> > > situation we are facing with result grouping.
> > > We currently have two collections, CollectionA and CollectionB, both of
> > > which contain an identical document, document1. We have created a new
> > alias
> > > collection that includes both CollectionA and CollectionB.
> > > However, when attempting to perform result grouping on this new alias
> > > collection, we are encountering an issue where two instances of
> document1
> > > appear in the output.
> > >
> > >
> >
> http://10.144.10.36:8983/solr/aliasCollection/select?q=id:document1&rows=40&group=true&group.field=fieldA&group.limit=20
> > > I have attempted to locate official documentation regarding this issue,
> > but
> > > have been unsuccessful. The closest resource I found was this link:
> > >
> > >
> >
> https://markmail.org/message/2ykh7wyexbnquc6s?q=list:org.apache.lucene.solr-user
> > > .
> > > Please let me know if you have any insights or suggestions on how to
> > > resolve this issue.
> > > Thank you for your time and attention.
> > >
> > > Best regards,
> > > Vinayak Hegde
> > >
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Reply via email to