[ 
https://issues.apache.org/jira/browse/SOLR-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136969#comment-16136969
 ] 

Domenico Fabio Marino commented on SOLR-10880:
----------------------------------------------

h2. *Current implementation:*
Replicas need to be "tagged" using replica properties:
for example the property "flavour" is set to "banana|vanilla"

The requests then need to specify what's the name of the property to be looked 
up (in this case "flavour") using the parameter "replica.tag.name"
and then they need to specify that they "like" a value for that property (that 
is, they are interested exclusively in the replicas that have that value)
Example:
{code:java}
replica.tag.name=flavour&replica.like=banana
{code}

Requests can otherwise specify that they want replicas that do not match a 
value ("dislike")
Example:

{code:java}
replica.tag.name=flavour&replica.dislike=vanilla
{code}

This behaviour however is not very extensible and does not provide enough 
support for SOLR-10610

h2. *Proposal:*
Following the suggestions from Tomás, me and Christine tried to come up with a 
solution that is both extensible and practical from a user point of view.
And it is described as follows:

Please note that this is just a proposal and the code has not been written yet 
(however it shouldn't differ too much from the current implementation)

Replicas have to be tagged using replica properties (separated by | in this 
example):
Example:

Shard1replica1 has birdColour=yellow and region=EMEA
shard2replica2 has region=US
shard3replica1 has birdColour=red

In order to use shard filtering, the requests need to have the parameter 
{noformat}filterByReplicaProp{noformat} set to true.
This is needed as the computation for property filtering can be expensive(with 
big number of replicas or properties) and the overhead may be noticeable.

h3. To use the replicas that have a specific property set to a specific value 
("filter")

The request then should have the parameter {noformat}shards.filter{noformat} 
set to {noformat}replicaProp.PROPERTY_NAME:PROPERTY_VALUE{noformat}
Example:
{code:java}
filterByReplicaProp=true&shards.filter=replicaProp.region:EMEA
{code}
Which means that the replica properties need to be inspected and that the 
request should only be executed on replicas that have the property 
{noformat}region{noformat} set to {noformat}EMEA{noformat}
Given the tag setup as above, this will only yield shard1replica1. 

h3. To use the replicas that do not have a specific property set to a specific 
value:
The request should have the parameter {noformat}shards.filterNot{noformat} set 
to {noformat}replicaProp.PROPERTY_NAME:PROPERTY_VALUE{noformat}
Example
For the purpose of this example let's suppose that there is a replica 
(shard3replica2) that is under maintenance, and therefore it is tagged with:
{noformat}maintenance=yes{noformat}

Then the request would need to have:
{code:java}
filterByReplicaProp=true&shards.filterNot=replicaProp.maintenance:yes
{code}
This means that the replica properties need to be inspected and that the 
request should be executed on replicas that do not have "maintenance" set to 
"yes"

Using {noformat}shards.filter=replicaProp...{noformat} or 
{noformat}shards.filterNot=replicaProp...{noformat} without specifying 
{noformat}filterByReplicaProp=true{noformat} will cause exceptions.
Using {noformat}filterByReplicaProp=true{noformat} without specifying a filter 
will not cause exceptions but is fundamentally useless and wastes computation 
time.
Filtering or filterNot on a property that is not present on any replica is 
likely to cause exceptions (this is an implementation detail).

h3. An extension to this proposal is one for Canary ( SOLR-10610 ):
Given a suitably tagged environment, the requests willing to use the canary 
component will have to specify both the 
{noformat}filterByReplicaProp=true{noformat} and then the 
{noformat}canary{noformat} parameter set to 
{noformat}PROPERTY_NAME:PROPERTY_VALUE{noformat}, for example:
{code:java}
filterByReplicaProp=true&canary=birdColour:yellow
{code}
Which means, the replica properties need to be inspected and that the canary 
component has to use the "canaries" that have the property 
{noformat}birdColour{noformat} set to {noformat}yellow{noformat}

The use of {noformat}canary{noformat} allows to clearly separate shard 
filtering with canary while offering a similar feature.

For further information on this component please refer to SOLR-10610 .

Unfortunately, due to implementation details, we could not come up with a 
solution that did not involve {noformat}filterByReplicaProp{noformat} or 
similar flags.
This is due to HttpShardHandler (the only place that has access to all the 
replicas and their properties) being executed before any component and hiding 
the Replica class to the components (the components are only given back a list 
of URLs and finding the replicas associated with each URL would be a clear 
violation of encapsulation and separation of concerns).
Furthermore, we do not want HttpShardHandler to care about what the components 
are going to do with the replica properties to not tie it to a specific 
implementation or add a myriad of conditionals during its execution.

h3. Future extensions (out of scope for this patch):

* Replica type filtering could be supported via:
 {noformat}shards.filter=replicaType:PULL{noformat} which means: only use 
replicas whose type is PULL
*  and similarly to above:
 {noformat}shards.filterNot=replicaType:NRT{noformat} which means: exclude all 
the replicas whose type is NRT
 
* Node role filtering is also a possible extension, for example:
 {noformat}shards.filter=nodeRole:analytics{noformat} which means: only use 
replicas whose role is analytics
*  and similarly to above:
 {noformat}shards.filterNot=nodeRole:overseer{noformat} which means: exclude 
all the replicas whose role is overseer

> Support replica filtering by tag
> --------------------------------
>
>                 Key: SOLR-10880
>                 URL: https://issues.apache.org/jira/browse/SOLR-10880
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Domenico Fabio Marino
>         Attachments: SOLR-10880.patch, SOLR-10880.patch, SOLR-10880.patch, 
> SOLR-10880.patch
>
>
> Add a mechanism to allow queries to use only a subset of replicas(by 
> specifying the wanted replica tag).
> Some replicas have to be marked as tag before running the query.
> A query has to specify ShardParams.REPLICA_TAG_NAME to specify what property 
> holds the tag it wants to use (for example "replica.tag") and then use 
> ShardParams.REPLICA_TAG_LIKE "tagName" to tell the ShardHandler to only use 
> the replicas matching tagName.
> A query can also use ShardParams.REPLICA_TAG_DISLIKE "tagName" to use all the 
> replicas that do not match tagName.
> In order to properly use this system, replicas need to be tagged, tagging a 
> replica involves setting the property ShardParams.REPLICA_TAG_NAME to a 
> property name and then set that property in the replicas.
> An example can be seen in the ReplicaTagTest included in this patch where a 
> dynamic cloud has some tags assigned to it both randomly and on a fixed basis.
> A replica can have multiple tags attached to it, and these tags are separated 
> by default by "|"(pipe character), the delimiter can be changed by setting 
> ShardParams.REPLICA_TAG_DELIMITER in the query to anything else.
> No validity check is performed on the tags, therefore one may get an array of 
> shard URLs that contains empty URLs, or that is null(when the property does 
> not exist), the user of this feature has to deal with it.
> The tag to replica mappings are rebuilt for each query that specifies 
> ShardParams.REPLICA_TAG_NAME. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to