[ 
https://issues.apache.org/jira/browse/SOLR-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15536477#comment-15536477
 ] 

Erick Erickson commented on SOLR-9583:
--------------------------------------

Fair points. I'll have to code-dive (and that's NOT happening today for various 
reasons) to say something competent, but I'd guess that we _already_ do 
something with facets and doc counts and the like. If you're saying that 
whatever we do is probably wrong, then it seems like we should fail in this 
case rather than let the users blissfully drive on. "Fail or do it right" 
maybe? Or return some kind of warning? Or.....



> When the same <uniqueKey> exists across multiple collections that are 
> searched with an alias, the document returned in the results list is 
> indeterminate
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9583
>                 URL: https://issues.apache.org/jira/browse/SOLR-9583
>             Project: Solr
>          Issue Type: Wish
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>
> Not quite sure whether to call this a bug or improvement...
> Currently if I have two collections C1 and C2 and an alias that points to 
> both _and_ I have a document in both collections with the _same_ <unkqueKey>, 
> the returned list  sometimes has the doc from C1 and sometimes from C2.
> If I add shards.info=true I see the document found in each collection, but 
> only one in the document list. Which one changes if I re-submit the identical 
> query.
> This seems incorrect, perhaps a side effect of piggy-backing the collection 
> aliasing on searching multiple shards? (Thanks Shalin for that bit of 
> background).
> I can see both use-cases: 
> 1>  aliasing multiple collections validly assumes that <uniqueKey>s should be 
> unique across them all and only one doc should be returned. Even in this case 
> which doc should be returned should be deterministic.
> 2> these are arbitrary collections without any a-priori relationship and 
> identical <unkqueKey>s do NOT identify the "same" document so both should be 
> returned.
> So I propose we do two things:
> a> provide a param for the CREATEALIAS command that controls whether docs 
> with the same <unkqueKey> from different collections should both be returned. 
> If they both should, there's still the question of in what order.
> b> provide a deterministic way dups from different collections are resolved. 
> What that algorithm is I'm not quite sure. The order the collections were 
> specified in the CREATEALIAS command? Some field in the documents? Other??? 
> What happens if this option is not specified on the CREATEALIAS command?
> Implicit in the above is my assumption that it's perfectly valid to have 
> different aliases in the same cluster behave differently if specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to