Thanks, David Smiley and Gus Heck for this wonderful insight. On Thu, Mar 16, 2023 at 3:46 AM Gus Heck <gus.h...@gmail.com> wrote:
> I think I recall past experience that if the ID is duplicated, you get one > or the other, and the one you get is non-deterministic, but as this is an > unsupported and untested configuration, I would expect other things like > facet counts etc to be thrown off. Also if the schemas use different fields > for identity or the collections assign different id's to the same document > then of course you likely get both showing up in the same results. That > said this may have changed, and maybe now it's possible to get two with the > same ID back, or it has become deterministic in some way. AFAIK It's not a > supported use case so anything could have changed. > > In short, you probably should not alias two collections containing the same > data into a single alias. Aliasing two collections with identical schema > and **different** data is the expected use case for aliases that point to > more than one collection. Schemas could be slightly different too, but > results involving the non-matching fields will become hard to predict. > > As a practical example of this, in Time Routed Aliases (TRA's) it's > important never to send the same document with changes to the value of the > routed field as that will create two time slices (collections) with a > document that has the same ID (see the very first warning here: > > https://solr.apache.org/guide/solr/latest/deployment-guide/aliases.html#routed-aliases > ) > > On Wed, Mar 15, 2023 at 5:02 PM David Smiley <dsmi...@apache.org> wrote: > > > When aliasing across collections, it's up to you/the-user to ensure that > > they don't contain the same document (by ID). I don't believe this is > > supported at all. If you find information to the contrary, let us know. > I > > could imagine some small code details to _do something_ if it could be > > detected in some cases but that isn't a substitute for truly > > working/supported. > > > > ~ David Smiley > > Apache Lucene/Solr Search Developer > > http://www.linkedin.com/in/davidwsmiley > > > > > > On Tue, Mar 7, 2023 at 5:34 AM Vinayak Hegde <vinayakph...@gmail.com> > > wrote: > > > > > Hello everyone, > > > I hope this email finds you well. I am reaching out to discuss a > strange > > > situation we are facing with result grouping. > > > We currently have two collections, CollectionA and CollectionB, both of > > > which contain an identical document, document1. We have created a new > > alias > > > collection that includes both CollectionA and CollectionB. > > > However, when attempting to perform result grouping on this new alias > > > collection, we are encountering an issue where two instances of > document1 > > > appear in the output. > > > > > > > > > http://10.144.10.36:8983/solr/aliasCollection/select?q=id:document1&rows=40&group=true&group.field=fieldA&group.limit=20 > > > I have attempted to locate official documentation regarding this issue, > > but > > > have been unsuccessful. The closest resource I found was this link: > > > > > > > > > https://markmail.org/message/2ykh7wyexbnquc6s?q=list:org.apache.lucene.solr-user > > > . > > > Please let me know if you have any insights or suggestions on how to > > > resolve this issue. > > > Thank you for your time and attention. > > > > > > Best regards, > > > Vinayak Hegde > > > > > > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) >