[ 
https://issues.apache.org/jira/browse/SOLR-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Domenico Fabio Marino updated SOLR-10880:
-----------------------------------------
    Description: 
Add a mechanism to allow queries to use only a subset of replicas(by specifying 
the wanted replica tag).

Replicas have to be marked with tags before running the query.

*Setup needed from the replica side*
Set the required properties to the required values in at least one replica.
----
*Setup needed from the query side*

A query has to specify {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} to specify 
that it is indeed interested in replica property filtering.
Then it should specify {{ShardParams.SHARDS_FILTER}} or 
{{ShardParams.SHARDS_FILTERNOT}} set to {{ShardParams.REPLICA_PROP}} followed 
by the property that has to be checked followed by ":" and then the value 
wanted.
Excample:
Given that some replicas have a property named {{region}}:

Adding the following params to the query:
{{filterByReplicaProp=true&shards.filter=replicaProp.region:EMEA}}
will ensure that the query uses replicas that have the property {{region}} set 
to {{EMEA}}

{{filterByReplicaProp=true&shards.filterNot=replicaProp.region:EMEA}}
will ensure that the query *does not* use replica that have the property 
{{region}} set to {{EMEA}} 
----

An example can be seen in the {{ReplicaTagTest}} included in this patch where a 
dynamic cloud has some tags assigned to it both randomly and on a fixed basis.

A replica can have multiple tags attached to it, and these tags are separated 
by default by "|"(pipe character), the delimiter can be changed by setting 
{{ShardParams.REPLICA_TAG_DELIMITER}} in the query to anything else.

The {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} is needed because the 
computation required to filter by property:value is quite complex and queries 
that don't care about replica filtering should not incur into the performance 
penalty.

The {{ShardParams.REPLICA_PROP}} (currently set to {{replicaProp.}} is needed 
to ensure that the system is extensible in the future.

*Usage warnings*

Using {{ShardParams.SHARDS_FILTER}} or {{ShardParams.SHARDS_FILTERNOT}} set to 
{{ShardParams.REPLICA_PROP}} without {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} 
will cause the {{QueryComponent}} to throw exceptions.

Using {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} without filters will not cause 
any error, but will likely waste computation time.

No validity check is performed on the tags, therefore one may get an array of 
shard URLs that contains empty URLs, or that is null(when the property does not 
exist), the user of this feature has to deal with it.


  was:
Add a mechanism to allow queries to use only a subset of replicas(by specifying 
the wanted replica tag).
Replicas have to be marked with tags before running the query.

A query has to specify ShardParams.FILTER_BY_REPLICA_PROPERTY to specify that 
it is indeed interested in replica property filtering.
Then it should specify ShardParams.FILTER or ShardParams.FILTERNOT set to 
ShardParams.REPLICA_PROP followed by the property that has to be checked 
followed by ":" and then the value wanted.
Excample:
ShardParams.FILTER_BY_REPLICA_PROPERTY = "true"&
In order to properly use this system, replicas need to be tagged, tagging a 
replica involves setting the property ShardParams.REPLICA_TAG_NAME to a 
property name and then set that property in the replicas.
An example can be seen in the ReplicaTagTest included in this patch where a 
dynamic cloud has some tags assigned to it both randomly and on a fixed basis.

A replica can have multiple tags attached to it, and these tags are separated 
by default by "|"(pipe character), the delimiter can be changed by setting 
ShardParams.REPLICA_TAG_DELIMITER in the query to anything else.

No validity check is performed on the tags, therefore one may get an array of 
shard URLs that contains empty URLs, or that is null(when the property does not 
exist), the user of this feature has to deal with it.

The tag to replica mappings are rebuilt for each query that specifies 
ShardParams.REPLICA_TAG_NAME. 


> Support replica filtering by tag
> --------------------------------
>
>                 Key: SOLR-10880
>                 URL: https://issues.apache.org/jira/browse/SOLR-10880
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Domenico Fabio Marino
>         Attachments: SOLR-10880.patch, SOLR-10880.patch, SOLR-10880.patch, 
> SOLR-10880.patch, SOLR-10880.patch
>
>
> Add a mechanism to allow queries to use only a subset of replicas(by 
> specifying the wanted replica tag).
> Replicas have to be marked with tags before running the query.
> *Setup needed from the replica side*
> Set the required properties to the required values in at least one replica.
> ----
> *Setup needed from the query side*
> A query has to specify {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} to specify 
> that it is indeed interested in replica property filtering.
> Then it should specify {{ShardParams.SHARDS_FILTER}} or 
> {{ShardParams.SHARDS_FILTERNOT}} set to {{ShardParams.REPLICA_PROP}} followed 
> by the property that has to be checked followed by ":" and then the value 
> wanted.
> Excample:
> Given that some replicas have a property named {{region}}:
> Adding the following params to the query:
> {{filterByReplicaProp=true&shards.filter=replicaProp.region:EMEA}}
> will ensure that the query uses replicas that have the property {{region}} 
> set to {{EMEA}}
> {{filterByReplicaProp=true&shards.filterNot=replicaProp.region:EMEA}}
> will ensure that the query *does not* use replica that have the property 
> {{region}} set to {{EMEA}} 
> ----
> An example can be seen in the {{ReplicaTagTest}} included in this patch where 
> a dynamic cloud has some tags assigned to it both randomly and on a fixed 
> basis.
> A replica can have multiple tags attached to it, and these tags are separated 
> by default by "|"(pipe character), the delimiter can be changed by setting 
> {{ShardParams.REPLICA_TAG_DELIMITER}} in the query to anything else.
> The {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} is needed because the 
> computation required to filter by property:value is quite complex and queries 
> that don't care about replica filtering should not incur into the performance 
> penalty.
> The {{ShardParams.REPLICA_PROP}} (currently set to {{replicaProp.}} is needed 
> to ensure that the system is extensible in the future.
> *Usage warnings*
> Using {{ShardParams.SHARDS_FILTER}} or {{ShardParams.SHARDS_FILTERNOT}} set 
> to {{ShardParams.REPLICA_PROP}} without 
> {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} will cause the {{QueryComponent}} 
> to throw exceptions.
> Using {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} without filters will not 
> cause any error, but will likely waste computation time.
> No validity check is performed on the tags, therefore one may get an array of 
> shard URLs that contains empty URLs, or that is null(when the property does 
> not exist), the user of this feature has to deal with it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to