[ 
https://issues.apache.org/jira/browse/SOLR-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328320#comment-15328320
 ] 

Shikha Somani commented on SOLR-8297:
-------------------------------------

Below are two proposed solutions to “Allow join query over 2 sharded 
collections” i.e. fixing the broken functionality in Solr 5.x. It is not an 
enhancement for supporting join on multiple shards present on same jvm.

*Proposed solution*: Two possible solutions:
# *Distributed join with Range*: This will allow join with greater flexibility 
by considering range instead of shard name (rigid criteria) while selecting 
fromCollection replica. The current implementation requires fromCollection to 
be singly sharded, with this solution fromCollection can be either singly 
sharded, equally sharded (as toCollection) or it can overlap with toCollection 
range.

** *Solution details*: A new parameter “joinMode” will be introduced. This 
parameter will govern on what basis replica will be selected based on range.
Possible values of joinMode:
#**Exact*: The “fromCollection” shard range should exactly match with 
“toCollection” shard present on that node then only join will be applied 
between two collections. This is the _default_ value
#**Overlap*: Shard range of “fromCollection” should overlap with “toCollection” 
on given node. 
#**Any*: This option will not consider range check, it will pick any replica of 
fromCollection that is present on that node and apply join
#*Non-distributed join*: The same way it worked in Solr 4.x. Client will 
mention exact replica of “fromCollection” with which join will be applied. It 
is required to pass  “distrib=false” in query parameters

If this solution is fine will submit a PR for this fix.

> Allow join query over 2 sharded collections: enhance functionality and 
> exception handling
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-8297
>                 URL: https://issues.apache.org/jira/browse/SOLR-8297
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 5.3
>            Reporter: Paul Blanchaert
>
> Enhancement based on SOLR-4905. New Jira issue raised as suggested by Mikhail 
> Khludnev.
> A) exception handling:
> The exception "SolrCloud join: multiple shards not yet supported" thrown in 
> the function findLocalReplicaForFromIndex of JoinQParserPlugin is not 
> triggered correctly: In my use-case, I've a join on a facet.query and when my 
> results are only found in 1 shard and the facet.query with the join is 
> querying the last replica of the last slice, then the exception is not thrown.
> I believe it's better to verify the nr of slices when we want to verify the  
> "multiple shards not yet supported" exception (so exception is thrown when 
> zkController.getClusterState().getSlices(fromIndex).size()>1).
> B) functional enhancement:
> I would expect that there is no problem to perform a cross-core join over 
> sharded collections when the following conditions are met:
> 1) both collections are sharded with the same replicationFactor and numShards
> 2) router.field of the collections is set to the same "key-field" (collection 
> of "fromindex" has router.field = "from" field and collection joined to has 
> router.field = "to" field)
> The router.field setup ensures that documents with the same "key-field" are 
> routed to the same node. 
> So the combination based on the "key-field" should always be available within 
> the same node.
> From a user perspective, I believe these assumptions seem to be a "normal" 
> use-case in the cross-core join in SolrCloud.
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to