David Smiley commented on SOLR-13749:

My _only_ concern with a non-whitelisted zkHost is that you could join from 
another cluster (you call this the "destination" but I find that orientation 
confusing) and _maybe_ somehow that could be used to get that data out?  I 
don't know how it could; doesn't seem realistically useful to a hacker.  And 
besides, additional network or other security measures could exist to further 
protect from that.  So nevermind.

> Implement support for joining across collections with multiple shards ( XCJF )
> ------------------------------------------------------------------------------
>                 Key: SOLR-13749
>                 URL: https://issues.apache.org/jira/browse/SOLR-13749
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Kevin Watters
>            Assignee: Gus Heck
>            Priority: Blocker
>             Fix For: 8.6
>         Attachments: 2020-03 Smiley with ASF hat.jpeg
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
> Example Solr Config.xml changes:
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{       }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{       }}{{size}}{{=}}{{"128"}}
>  {{       }}{{initialSize}}{{=}}{{"0"}}
>  {{       }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>  {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
> {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
>  {{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin</}}{{str}}{{>}}
>  {{</}}{{queryParser}}{{>}}
>  {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
> {{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
> {{/>}}
> Example Usage:
> {{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
> {{to=}}{{"toField"}} {{v=}}{{"**:**"}}{{}}}

This message was sent by Atlassian Jira

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to