[
https://issues.apache.org/jira/browse/SOLR-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Domenico Fabio Marino updated SOLR-10880:
-----------------------------------------
Description:
Add a mechanism to allow queries to use only a subset of replicas(by specifying
the wanted replica tag).
Replicas have to be marked with tags before running the query.
*Setup needed from the replica side*
Set the required properties to the required values in at least one replica.
----
*Setup needed from the query side*
A query has to specify {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} to specify
that it is indeed interested in replica property filtering.
Then it should specify {{ShardParams.SHARDS_FILTER}} or
{{ShardParams.SHARDS_FILTERNOT}} set to {{ShardParams.REPLICA_PROP}} followed
by the property that has to be checked followed by ":" and then the value
wanted.
Excample:
Given that some replicas have a property named {{region}}:
Adding the following params to the query:
{{filterByReplicaProp=true&shards.filter=replicaProp.region:EMEA}}
will ensure that the query uses replicas that have the property {{region}} set
to {{EMEA}}
{{filterByReplicaProp=true&shards.filterNot=replicaProp.region:EMEA}}
will ensure that the query *does not* use replica that have the property
{{region}} set to {{EMEA}}
----
An example can be seen in the {{ReplicaTagTest}} included in this patch where a
dynamic cloud has some tags assigned to it both randomly and on a fixed basis.
A replica can have multiple tags attached to it, and these tags are separated
by default by "|"(pipe character), the delimiter can be changed by setting
{{ShardParams.REPLICA_TAG_DELIMITER}} in the query to anything else.
The {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} is needed because the
computation required to filter by property:value is quite complex and queries
that don't care about replica filtering should not incur into the performance
penalty.
The {{ShardParams.REPLICA_PROP}} (currently set to {{replicaProp.}} is needed
to ensure that the system is extensible in the future.
*Usage warnings*
Using {{ShardParams.SHARDS_FILTER}} or {{ShardParams.SHARDS_FILTERNOT}} set to
{{ShardParams.REPLICA_PROP}} without {{ShardParams.FILTER_BY_REPLICA_PROPERTY}}
will cause the {{QueryComponent}} to throw exceptions.
Using {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} without filters will not cause
any error, but will likely waste computation time.
No validity check is performed on the tags, therefore one may get an array of
shard URLs that contains empty URLs, or that is null(when the property does not
exist), the user of this feature has to deal with it.
was:
Add a mechanism to allow queries to use only a subset of replicas(by specifying
the wanted replica tag).
Replicas have to be marked with tags before running the query.
A query has to specify ShardParams.FILTER_BY_REPLICA_PROPERTY to specify that
it is indeed interested in replica property filtering.
Then it should specify ShardParams.FILTER or ShardParams.FILTERNOT set to
ShardParams.REPLICA_PROP followed by the property that has to be checked
followed by ":" and then the value wanted.
Excample:
ShardParams.FILTER_BY_REPLICA_PROPERTY = "true"&
In order to properly use this system, replicas need to be tagged, tagging a
replica involves setting the property ShardParams.REPLICA_TAG_NAME to a
property name and then set that property in the replicas.
An example can be seen in the ReplicaTagTest included in this patch where a
dynamic cloud has some tags assigned to it both randomly and on a fixed basis.
A replica can have multiple tags attached to it, and these tags are separated
by default by "|"(pipe character), the delimiter can be changed by setting
ShardParams.REPLICA_TAG_DELIMITER in the query to anything else.
No validity check is performed on the tags, therefore one may get an array of
shard URLs that contains empty URLs, or that is null(when the property does not
exist), the user of this feature has to deal with it.
The tag to replica mappings are rebuilt for each query that specifies
ShardParams.REPLICA_TAG_NAME.
> Support replica filtering by tag
> --------------------------------
>
> Key: SOLR-10880
> URL: https://issues.apache.org/jira/browse/SOLR-10880
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Domenico Fabio Marino
> Attachments: SOLR-10880.patch, SOLR-10880.patch, SOLR-10880.patch,
> SOLR-10880.patch, SOLR-10880.patch
>
>
> Add a mechanism to allow queries to use only a subset of replicas(by
> specifying the wanted replica tag).
> Replicas have to be marked with tags before running the query.
> *Setup needed from the replica side*
> Set the required properties to the required values in at least one replica.
> ----
> *Setup needed from the query side*
> A query has to specify {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} to specify
> that it is indeed interested in replica property filtering.
> Then it should specify {{ShardParams.SHARDS_FILTER}} or
> {{ShardParams.SHARDS_FILTERNOT}} set to {{ShardParams.REPLICA_PROP}} followed
> by the property that has to be checked followed by ":" and then the value
> wanted.
> Excample:
> Given that some replicas have a property named {{region}}:
> Adding the following params to the query:
> {{filterByReplicaProp=true&shards.filter=replicaProp.region:EMEA}}
> will ensure that the query uses replicas that have the property {{region}}
> set to {{EMEA}}
> {{filterByReplicaProp=true&shards.filterNot=replicaProp.region:EMEA}}
> will ensure that the query *does not* use replica that have the property
> {{region}} set to {{EMEA}}
> ----
> An example can be seen in the {{ReplicaTagTest}} included in this patch where
> a dynamic cloud has some tags assigned to it both randomly and on a fixed
> basis.
> A replica can have multiple tags attached to it, and these tags are separated
> by default by "|"(pipe character), the delimiter can be changed by setting
> {{ShardParams.REPLICA_TAG_DELIMITER}} in the query to anything else.
> The {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} is needed because the
> computation required to filter by property:value is quite complex and queries
> that don't care about replica filtering should not incur into the performance
> penalty.
> The {{ShardParams.REPLICA_PROP}} (currently set to {{replicaProp.}} is needed
> to ensure that the system is extensible in the future.
> *Usage warnings*
> Using {{ShardParams.SHARDS_FILTER}} or {{ShardParams.SHARDS_FILTERNOT}} set
> to {{ShardParams.REPLICA_PROP}} without
> {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} will cause the {{QueryComponent}}
> to throw exceptions.
> Using {{ShardParams.FILTER_BY_REPLICA_PROPERTY}} without filters will not
> cause any error, but will likely waste computation time.
> No validity check is performed on the tags, therefore one may get an array of
> shard URLs that contains empty URLs, or that is null(when the property does
> not exist), the user of this feature has to deal with it.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]