[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Watters updated SOLR-13749:
---------------------------------
    Description: 
This ticket includes 2 query parsers.


 The first one is the "Cross collection join filter"  (XCJF) parser. This is 
the "Cross-collection join filter" query parser. It can do a call out to a 
remote collection to get a set of join keys to be used as a filter against the 
local collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query parser will do an intersection based on join keys between 2 
collections.

The local collection is the collection that you are searching against.

The remote collection is the collection that contains the join keys that you 
want to use as a filter.

Each shard participating in the distributed request will execute a query 
against the remote collection.  If the local collection is setup with the 
compositeId router to be routed on the join key field, a hash range query is 
applied to the remote collection query to only match the documents that contain 
a potential match for the documents that are in the local shard/core.  

 

Here's some vocab to help with the descriptions of the various parameters.
||Term||Description||
|Local Collection|This is the main collection that is being queried.|
|Remote Collection|This is the collection that the XCJFQuery will query to 
resolve the join keys.|
|XCJFQuery|The lucene query that executes a search to get back a set of join 
keys from a remote collection|
|HashRangeQuery|The lucene query that matches only the documents whose hash 
code on a field falls within a specified range.|

 

 
||Param ||Default ||Required ||Description ||
|collection| |Required|The name of the external Solr collection to be queried 
to retrieve the set of join key values ( required )|
|zkHost| |Optional|The connection string to be used to connect to Zookeeper.  
zkHost and solrUrl are both optional parameters, and at most one of them should 
be specified.  If neither of zkHost or solrUrl are specified, the local 
Zookeeper cluster will be used. ( optional )|
|solrUrl| |Optional|The URL of the external Solr node to be queried ( optional 
)|
|from| |Required|The join key field name in the external collection ( required 
)|
|to| |Required|The join key field name in the local collection|
|v| |See Note|The query to be executed against the external Solr collection to 
retrieve the set of join key values.  Note:  The original query can be passed 
at the end of the string or as the "v" parameter.  It's recommended to use 
query parameter substitution with the "v" parameter to ensure no issues arise 
with the default query parsers.|
|routed|See Notes| |true / false.  If true, the XCJF query will use each 
shard's hash range to determine the set of join keys to retrieve for that 
shard.  This parameter improves the performance of the cross-collection join, 
but it depends on the local collection being routed by the toField.  If this 
parameter is not specified, the XCJF query will try to determine the correct 
value automatically.|
|ttl|3600| |The length of time that an XCJF query in the cache will be 
considered valid, in seconds.  Defaults to 3600 (one hour).  The XCJF query 
will not be aware of changes to the remote collection, so if the remote 
collection is updated, cached XCJF queries may give inaccurate results.  After 
the ttl period has expired, the XCJF query will re-execute the join against the 
remote collection.|
|_All others_| | |Any normal Solr parameter can also be specified as a local 
param.|

 

Example Solr Config.xml changes:

 
{{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
{{       }}{{class}}{{=}}{{"solr.LRUCache"}}
{{       }}{{size}}{{=}}{{"128"}}
{{       }}{{initialSize}}{{=}}{{"0"}}
{{       }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
 
{{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
{{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin</}}{{str}}{{>}}
{{</}}{{queryParser}}{{>}}
 
{{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
{{/>}}
 

Example Usage:

{{{{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
{{to=}}{{"toField"}} {{v=}}{{"*:*"}}{{}}}}}
 
 

 

 

 

  was:
This ticket includes 2 query parsers.


 The first one is the "Cross collection join filter"  (XCJF) parser. This is 
the "Cross-collection join filter" query parser. It can do a call out to a 
remote collection to get a set of join keys to be used as a filter against the 
local collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query parser will do an intersection based on join keys between 2 
collections.

The local collection is the collection that you are searching against.

The remote collection is the collection that contains the join keys that you 
want to use as a filter.

Each shard participating in the distributed request will execute a query 
against the remote collection.  If the local collection is setup with the 
compositeId router to be routed on the join key field, a hash range query is 
applied to the remote collection query to only match the documents that contain 
a potential match for the documents that are in the local shard/core.  

 

Here's some vocab to help with the descriptions of the various parameters.
||Term||Description||
|Local Collection|This is the main collection that is being queried.|
|Remote Collection|This is the collection that the XCJFQuery will query to 
resolve the join keys.|
|XCJFQuery|The lucene query that executes a search to get back a set of join 
keys from a remote collection|
|HashRangeQuery|The lucene query that matches only the documents whose hash 
code on a field falls within a specified range.|

 

 
||Param||Default||Required||Description||
|collection| |Required|The name of the external Solr collection to be queried 
to retrieve the set of join key values ( required )|
|zkHost| |Optional|The connection string to be used to connect to Zookeeper.  
zkHost and solrUrl are both optional parameters, and at most one of them should 
be specified.  If neither of zkHost or solrUrl are specified, the local 
Zookeeper cluster will be used. ( optional )|
|solrUrl| |Optional|The URL of the external Solr node to be queried ( optional 
)|
|from| |Required|The join key field name in the external collection ( required 
)|
|to| |Required|The join key field name in the local collection|
|v| |See Note|The query to be executed against the external Solr collection to 
retrieve the set of join key values.  Note:  The original query can be passed 
at the end of the string or as the "v" parameter.  It's recommended to use 
query parameter substitution with the "v" parameter to ensure no issues arise 
with the default query parsers.|
|routed|See Notes| |true / false.  If true, the XCJF query will use each 
shard's hash range to determine the set of join keys to retrieve for that 
shard.  This parameter improves the performance of the cross-collection join, 
but it depends on the local collection being routed by the toField.  If this 
parameter is not specified, the XCJF query will try to determine the correct 
value automatically.|
|ttl|3600| |The length of time that an XCJF query in the cache will be 
considered valid, in seconds.  Defaults to 3600 (one hour).  The XCJF query 
will not be aware of changes to the remote collection, so if the remote 
collection is updated, cached XCJF queries may give inaccurate results.  After 
the ttl period has expired, the XCJF query will re-execute the join against the 
remote collection.|
|_All others_| | |Any normal Solr parameter can also be specified as a local 
param.|

 

Example Solr Config.xml changes:

 
{{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
{{       }}{{class}}{{=}}{{"solr.LRUCache"}}
{{       }}{{size}}{{=}}{{"128"}}
{{       }}{{initialSize}}{{=}}{{"0"}}
{{       }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
 
{{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
{{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin</}}{{str}}{{>}}
{{</}}{{queryParser}}{{>}}
 
{{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
{{/>}}
 

Example Usage:

{{{{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
{{to=}}{{"toField"}} {{v=}}{{"*:*"}}{{}}}}}
 
 

 

 

 


> Implement support for joining across collections with multiple shards ( XCJF )
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-13749
>                 URL: https://issues.apache.org/jira/browse/SOLR-13749
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Kevin Watters
>            Priority: Major
>
> This ticket includes 2 query parsers.
>  The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param&nbsp;||Default&nbsp;||Required&nbsp;||Description&nbsp;||
> |collection| |Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost| |Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  If neither of zkHost or solrUrl are specified, the 
> local Zookeeper cluster will be used. ( optional )|
> |solrUrl| |Optional|The URL of the external Solr node to be queried ( 
> optional )|
> |from| |Required|The join key field name in the external collection ( 
> required )|
> |to| |Required|The join key field name in the local collection|
> |v| |See Note|The query to be executed against the external Solr collection 
> to retrieve the set of join key values.  Note:  The original query can be 
> passed at the end of the string or as the "v" parameter.  It's recommended to 
> use query parameter substitution with the "v" parameter to ensure no issues 
> arise with the default query parsers.|
> |routed|See Notes| |true / false.  If true, the XCJF query will use each 
> shard's hash range to determine the set of join keys to retrieve for that 
> shard.  This parameter improves the performance of the cross-collection join, 
> but it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, the XCJF query will try to determine the correct 
> value automatically.|
> |ttl|3600| |The length of time that an XCJF query in the cache will be 
> considered valid, in seconds.  Defaults to 3600 (one hour).  The XCJF query 
> will not be aware of changes to the remote collection, so if the remote 
> collection is updated, cached XCJF queries may give inaccurate results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| | |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
> {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
> {{       }}{{class}}{{=}}{{"solr.LRUCache"}}
> {{       }}{{size}}{{=}}{{"128"}}
> {{       }}{{initialSize}}{{=}}{{"0"}}
> {{       }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>  
> {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
> {{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
> {{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin</}}{{str}}{{>}}
> {{</}}{{queryParser}}{{>}}
>  
> {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
> {{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
> {{/>}}
>  
> Example Usage:
> {{{{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
> {{to=}}{{"toField"}} {{v=}}{{"*:*"}}{{}}}}}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to