[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

Shalin Shekhar Mangar (JIRA) Tue, 21 Apr 2015 02:39:48 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shalin Shekhar Mangar updated SOLR-6220:
----------------------------------------
    Description: 
ol < 2ol < 2h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch param  . eg: snitch=EC2Snitch or 
snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify 
{{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
present in the rules , it is automatically used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk cpace available in the nodeol < 2
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a 
value that is passed to the node as {{-Dkey=keyValue}} during the node startup. 
It is possible to use rules like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
      class:“ImplicitSnitch”
    }
  “rules”:[{"cores":"4-"}, 
             {"replica":"1" ,"shard" :"*" ,"node":"*"},
             {"disk":"1+"}]ol < 2
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes 
according to affinol < 2ity. For example, if there is a rule that says 
{{disk:100+}} , nodes with 100GB or more are given higher preference. If all 
alse are equal , nodes with fewer cores are given higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed 
{{~}} to specify fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, 
if no matches found , relax that rule". 
rack:*,shard:*,replica:~1-

#Another example, assign all replicas to nodes with disk space of 100GB or 
more,, or relax the rule if not possible
disk:~100+
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:2-
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2-ol < 2ol < 2ol < 2
 rack:*,replica:2-

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:0-
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:20+
 replica:*,disk:20+
 disk:20+
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:5-
replica:*,cores:5-
cores:5-
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack:!738,shard:shard1,replica:*
rack:!738,shard:shard1
#No replica  of ANY shard should go to rack 738
rack:!738,shard:**,replica:*
rack:!738,shard:*
rack:!738
{noformat}



In the collection create API all the placement rules are provided as a 
parameters called rule
example:
{noformat}
snitch=EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:2-,dc:dc3&rule=shard:shard1,replica:,rack:!738}
 
{noformat}

  was:
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a 
cluster are allocated . Solr should have a flexible mechanism through which we 
should be able to control allocation of replicas or later change it to suit the 
needs of the system

All configurations are per collection basis. The rules are applied whenever a 
replica is created in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection 
create command with the snitch param  . eg: snitch=EC2Snitch or 
snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify 
{{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
present in the rules , it is automatically used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk cpace available in the node
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a 
value that is passed to the node as {{-Dkey=keyValue}} during the node startup. 
It is possible to use rules like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes 
with the given key value pairs. These parameters will be passed on to the 
collection CREATE api as a multivalued parameter  "rule" . The values will be 
saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
      class:“ImplicitSnitch”
    }
  “rules”:[{"cores":"4-"}, 
             {"replica":"1" ,"shard" :"*" ,"node":"*"},
             {"disk":"1+"}]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not 
conflict with each other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and 
an operand such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} 
means ANY.  default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} 
and {{replica}}.  
* all keys other than {{shard}} and {{replica}} are called tags and the tags 
are nothing but values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by 
the system implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes 
according to affinity. For example, if there is a rule that says {{disk:100+}} 
, nodes with 100GB or more are given higher preference. If all alse are equal , 
nodes with fewer cores are given higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed 
{{~}} to specify fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, 
if no matches found , relax that rule". 
rack:*,shard:*,replica:~1-

#Another example, assign all replicas to nodes with disk space of 100GB or 
more,, or relax the rule if not possible
disk:~100+
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:2-
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2-
 rack:*,replica:2-

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:0-
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:20+
 replica:*,disk:20+
 disk:20+
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:5-
replica:*,cores:5-
cores:5-
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack:!738,shard:shard1,replica:*
rack:!738,shard:shard1
#No replica  of ANY shard should go to rack 738
rack:!738,shard:**,replica:*
rack:!738,shard:*
rack:!738
{noformat}



In the collection create API all the placement rules are provided as a 
parameters called rule
example:
{noformat}
snitch=EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:2-,dc:dc3&rule=shard:shard1,replica:,rack:!738}
 
{noformat}


> Replica placement strategy for solrcloud
> ----------------------------------------
>
>                 Key: SOLR-6220
>                 URL: https://issues.apache.org/jira/browse/SOLR-6220
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>         Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> ol < 2ol < 2h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of 
> a cluster are allocated . Solr should have a flexible mechanism through which 
> we should be able to control allocation of replicas or later change it to 
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a 
> replica is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection 
> create command with the snitch param  . eg: snitch=EC2Snitch or 
> snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify 
> {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are 
> present in the rules , it is automatically used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk cpace available in the nodeol < 2
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a 
> value that is passed to the node as {{-Dkey=keyValue}} during the node 
> startup. It is possible to use rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes 
> with the given key value pairs. These parameters will be passed on to the 
> collection CREATE api as a multivalued parameter  "rule" . The values will be 
> saved in the state of the collection as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>       class:“ImplicitSnitch”
>     }
>   “rules”:[{"cores":"4-"}, 
>              {"replica":"1" ,"shard" :"*" ,"node":"*"},
>              {"disk":"1+"}]ol < 2
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and 
> values
> *Each collection can have any number of rules. As long as the rules do not 
> conflict with each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count 
> and an operand such as {{+}} or {{-}}
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or 
> {{**}} means ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} 
> and {{replica}}.  
> * all keys other than {{shard}} and {{replica}} are called tags and the tags 
> are nothing but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided 
> by the system implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes 
> according to affinol < 2ity. For example, if there is a rule that says 
> {{disk:100+}} , nodes with 100GB or more are given higher preference. If all 
> alse are equal , nodes with fewer cores are given higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be 
> prefixed {{~}} to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, 
> if no matches found , relax that rule". 
> rack:*,shard:*,replica:~1-
> #Another example, assign all replicas to nodes with disk space of 100GB or 
> more,, or relax the rule if not possible
> disk:~100+
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:2-
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2-ol < 2ol < 2ol < 2
>  rack:*,replica:2-
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node there should be a max one replica of ANY shard
>  node:*,shard:**,replica:1-
>  node:*,replica:1-
>  
> #In rack 738 and shard=shard1, there can be a max 0 replica
>  rack:738,shard:shard1,replica:0-
>  
>  #All replicas of shard1 should go to rack 730
>  shard:shard1,replica:*,rack:730
>  shard:shard1,rack:730
>  #all replicas must be created in a node with at least 20GB disk
>  replica:*,shard:*,disk:20+
>  replica:*,disk:20+
>  disk:20+
> #All replicas should be created in nodes with less than 5 cores
> #In this ANY AND each for shard have same meaning
> replica:*,shard:**,cores:5-
> replica:*,cores:5-
> cores:5-
> #one replica of shard1 must go to node 192.168.1.2:8080_solr
> node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
> #No replica of shard1 should go to rack 738
> rack:!738,shard:shard1,replica:*
> rack:!738,shard:shard1
> #No replica  of ANY shard should go to rack 738
> rack:!738,shard:**,replica:*
> rack:!738,shard:*
> rack:!738
> {noformat}
> In the collection create API all the placement rules are provided as a 
> parameters called rule
> example:
> {noformat}
> snitch=EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:2-,dc:dc3&rule=shard:shard1,replica:,rack:!738}
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud

Reply via email to