[jira] [Commented] (SOLR-9583) When the same exists across multiple collections that are searched with an alias, the document returned in the results list is indeterminate

David Smiley (JIRA) Fri, 30 Sep 2016 09:42:36 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15536432#comment-15536432
 ]


David Smiley commented on SOLR-9583:
------------------------------------

Sorry Erick... I simply mean that, AFAIK, the distributed-search feature has a 
fundamental assumption that there are no keys duplicated across cores (shards). 
 AFAIK that fundamental assumption hasn't changed since its inception (Solr 
1.3?), despite SolrCloud & alias'ing.  If you violate that assumption... who 
knows what will happen -- "undefined".  I think attempting to support duplicate 
keys raises bigger questions than simply resolving the particular effects you 
report here.  For example faceting... I can't imagine the system efficiently 
deduplicating before counting.  Or even quite simply returning the matching doc 
count -- same thing.

> When the same <uniqueKey> exists across multiple collections that are 
> searched with an alias, the document returned in the results list is 
> indeterminate
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9583
>                 URL: https://issues.apache.org/jira/browse/SOLR-9583
>             Project: Solr
>          Issue Type: Wish
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>
> Not quite sure whether to call this a bug or improvement...
> Currently if I have two collections C1 and C2 and an alias that points to 
> both _and_ I have a document in both collections with the _same_ <unkqueKey>, 
> the returned list  sometimes has the doc from C1 and sometimes from C2.
> If I add shards.info=true I see the document found in each collection, but 
> only one in the document list. Which one changes if I re-submit the identical 
> query.
> This seems incorrect, perhaps a side effect of piggy-backing the collection 
> aliasing on searching multiple shards? (Thanks Shalin for that bit of 
> background).
> I can see both use-cases: 
> 1>  aliasing multiple collections validly assumes that <uniqueKey>s should be 
> unique across them all and only one doc should be returned. Even in this case 
> which doc should be returned should be deterministic.
> 2> these are arbitrary collections without any a-priori relationship and 
> identical <unkqueKey>s do NOT identify the "same" document so both should be 
> returned.
> So I propose we do two things:
> a> provide a param for the CREATEALIAS command that controls whether docs 
> with the same <unkqueKey> from different collections should both be returned. 
> If they both should, there's still the question of in what order.
> b> provide a deterministic way dups from different collections are resolved. 
> What that algorithm is I'm not quite sure. The order the collections were 
> specified in the CREATEALIAS command? Some field in the documents? Other??? 
> What happens if this option is not specified on the CREATEALIAS command?
> Implicit in the above is my assumption that it's perfectly valid to have 
> different aliases in the same cluster behave differently if specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9583) When the same exists across multiple collections that are searched with an alias, the document returned in the results list is indeterminate

Reply via email to