[
https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noble Paul updated SOLR-6220:
-----------------------------
Description:
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a
cluster are allocated . Solr should have a flexible mechanism through which we
should be able to control allocation of replicas or later change it to suit the
needs of the system
All configurations are per collection basis. The rules are applied whenever a
replica is created in any of the shards in a given collection during
* collection creation
* shard splitting
* add replica
* createsshard
There are two aspects to how replicas are placed: snitch and placement.
h2.snitch
How to identify the tags of nodes. Snitches are configured through collection
create command with the snitch prefix . eg: snitch.type=EC2Snitch.
The system provides the following implicit tag names which cannot be used by
other snitches
* node : The solr nodename
* host : The hostname
* ip : The ip address of the host
* cores : This is a dynamic varibale which gives the core count at any given
point
* disk : This is a dynamic variable which gives the available disk space at
any given point
There will a few snitches provided by the system such as
h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2
h3.IPSnitch
Use the IP to infer the “dc” and “rack” values
h3.NodePropertySnitch
This lets users provide system properties to each node with tagname and value .
example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this
particular node will have two tags “tag-x” and “tag-y” .
h3.RestSnitch
Which lets the user configure a url which the server can invoke and get all
the tags for a given node.
This takes extra parameters in create command
example:
{{snitch.type=RestSnitch&snitch.url=http://snitchserverhost:port?nodename={}}}
The response of the rest call
{{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
must be in either json format or properties format.
eg:
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
or
{noformat}
tag-x=x-val
tag-y=y-val
{noformat}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The
user should be able to manage the tags and values of each node through a
collection API
h2.Placement
This tells how many replicas for a given shard needs to be assigned to nodes
with the given key value pairs. These parameters will be passed on to the
collection CREATE api as a parameter "placement" . The values will be saved in
the state of the collection as follows
{code:Javascript}
{
“mycollection”:{
“snitch”: {
type:“EC2Snitch”
}
“placement”:{
“key1”: “value1”,
“key2”: “value2”,
}
}
{code}
A rule consists of 2 parts
* LHS or the qualifier : The format is \{shardname}.\{replicacount} . Use
the wild card “*” for qualifying all
* RHS or conditions : The format is \{tagname}\{operand}\{value} . The tag
name and values are provided by the snitch. The supported operands are
** -> : equals
** > : greater than . Only applicable for numeric tags
** < : less than , Only applicable to numeric tags
** ! : NOT or not equals
Each collection can have any number of rules. As long as the rules do not
conflict with each other it should be OK. Or else an error is thrown
Example rules:
* “shard1.1”:“dc->dc1&rack->168” : This would assign exactly 1 replica for
shard1 with nodes having tags “dc=dc1,rack=168”.
* “shard1.1+”:“dc->dc1&rack->168” : Same as above but assigns atleast one
replica to the tag val combination
* “*.1”:“dc->dc1” : For all shards keep exactly one replica in dc:dc1
* “*.1+”:”dc->dc2” : At least one replica needs to be in dc:dc2
* “*.2-”:”dc->dc3” : Keep a maximum of 2 replicas in dc:dc3 for all shards
* “shard1.*”:”rack->730” : All replicas of shard1 will go to rack 730
* “shard1.1”:“node->192.167.1.2:8983_solr” : 1 replica of shard1 must go to
the node 192.167.1.28983_solr
* “shard1.* : “rack!738” : No replica of shard1 should go to rack 738
* “shard1.* : “host!192.168.89.91” : No replica of shard1 should go to host
192.168.89.91
* “*.*”: “cores<5”: All replicas should be created in nodes with less than 5
cores
* “*.*”:”disk>20gb” : All replicas must be created in nodes with disk space
greater than 20gb
In the collection create API all the placement rules are provided as a
parameter called placement and multiple rules are separated with "|"
example:
{noformat}
snitch.type=EC2Snitch&placement=*.1:dc->dc1|*.2-:dc->dc3|!shard1.*:rack->738
{noformat}
was:
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a
cluster are allocated . Solr should have a flexible mechanism through which we
should be able to control allocation of replicas or later change it to suit the
needs of the system
All configurations are per collection basis. The rules are applied whenever a
replica is created in any of the shards in a given collection during
* collection creation
* shard splitting
* add replica
* createsshard
There are two aspects to how replicas are placed: snitch and placement.
h2.snitch
How to identify the tags of nodes. Snitches are configured through collection
create command with the snitch prefix . eg: snitch.type=EC2Snitch.
The system provides the following implicit tag names which cannot be used by
other snitches
* node : The solr nodename
* host : The hostname
* ip : The ip address of the host
* cores : This is a dynamic varibale which gives the core count at any given
point
* disk : This is a dynamic variable which gives the available disk space at
any given point
There will a few snitches provided by the system such as
h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2
h3.IPSnitch
Use the IP to infer the “dc” and “rack” values
h3.NodePropertySnitch
This lets users provide system properties to each node with tagname and value .
example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this
particular node will have two tags “tag-x” and “tag-y” .
h3.RestSnitch
Which lets the user configure a url which the server can invoke and get all
the tags for a given node.
This takes extra parameters in create command
example:
{{snitch.type=RestSnitch&snitch.url=http://snitchserverhost:port?nodename={}}}
The response of the rest call
{{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
must be in either json format or properties format.
eg:
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
or
{noformat}
tag-x=x-val
tag-y=y-val
{noformat}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The
user should be able to manage the tags and values of each node through a
collection API
h2.Placement
This tells how many replicas for a given shard needs to be assigned to nodes
with the given key value pairs. These parameters will be passed on to the
collection CREATE api as a parameter "placement" . The values will be saved in
the state of the collection as follows
{code:Javascript}
{
“mycollection”:{
“snitch”: {
type:“EC2Snitch”
}
“placement”:{
“key1”: “value1”,
“key2”: “value2”,
}
}
{code}
A rule consists of 2 parts
* LHS or the qualifier : The format is \{shardname}.\{replicacount} . Use
the wild card “*” for qualifying all. Use the \(!) operand for exclusion
* RHS or conditions : The format is \{tagname}\{operand}\{value} . The tag
name and values are provided by the snitch. The supported operands are
** -> : equals
** > : greater than . Only applicable for numeric tags
**< : less than , Only applicable to numeric tags
Each collection can have any number of rules. As long as the rules do not
conflict with each other it should be OK. Or else an error is thrown
Example rules:
* “shard1.1”:“dc->dc1&rack->168” : This would assign exactly 1 replica for
shard1 with nodes having tags “dc=dc1,rack=168”.
* “shard1.1+”:“dc->dc1&rack->168” : Same as above but assigns atleast one
replica to the tag val combination
* “*.1”:“dc->dc1” : For all shards keep exactly one replica in dc:dc1
* “*.1+”:”dc->dc2” : At least one replica needs to be in dc:dc2
* “*.2-”:”dc->dc3” : Keep a maximum of 2 replicas in dc:dc3 for all shards
* “shard1.*”:”rack->730” : All replicas of shard1 will go to rack 730
* “shard1.1”:“node->192.167.1.2:8983_solr” : 1 replica of shard1 must go to
the node 192.167.1.28983_solr
* “!shard1.* : “rack->738” : No replica of shard1 should go to rack 738
* “!shard1.* : “host->192.168.89.91” : No replica of shard1 should go to host
192.168.89.91
* “*.*”: “cores<5”: All replicas should be created in nodes with less than 5
cores
* “*.*”:”disk>20gb” : All replicas must be created in nodes with disk space
greater than 20gb
In the collection create API all the placement rules are provided as a
parameter called placement and multiple rules are separated with "|"
example:
{noformat}
snitch.type=EC2Snitch&placement=*.1:dc->dc1|*.2-:dc->dc3|!shard1.*:rack->738
{noformat}
> Replica placement startegy for solrcloud
> ----------------------------------------
>
> Key: SOLR-6220
> URL: https://issues.apache.org/jira/browse/SOLR-6220
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Noble Paul
> Assignee: Noble Paul
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of
> a cluster are allocated . Solr should have a flexible mechanism through which
> we should be able to control allocation of replicas or later change it to
> suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a
> replica is created in any of the shards in a given collection during
> * collection creation
> * shard splitting
> * add replica
> * createsshard
> There are two aspects to how replicas are placed: snitch and placement.
> h2.snitch
> How to identify the tags of nodes. Snitches are configured through collection
> create command with the snitch prefix . eg: snitch.type=EC2Snitch.
> The system provides the following implicit tag names which cannot be used by
> other snitches
> * node : The solr nodename
> * host : The hostname
> * ip : The ip address of the host
> * cores : This is a dynamic varibale which gives the core count at any given
> point
> * disk : This is a dynamic variable which gives the available disk space at
> any given point
> There will a few snitches provided by the system such as
> h3.EC2Snitch
> Provides two tags called dc, rack from the region and zone values in EC2
> h3.IPSnitch
> Use the IP to infer the “dc” and “rack” values
> h3.NodePropertySnitch
> This lets users provide system properties to each node with tagname and value
> .
> example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this
> particular node will have two tags “tag-x” and “tag-y” .
>
> h3.RestSnitch
> Which lets the user configure a url which the server can invoke and get all
> the tags for a given node.
> This takes extra parameters in create command
> example:
> {{snitch.type=RestSnitch&snitch.url=http://snitchserverhost:port?nodename={}}}
> The response of the rest call
> {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
> must be in either json format or properties format.
> eg:
> {code:JavaScript}
> {
> “tag-x”:”x-val”,
> “tag-y”:”y-val”
> }
> {code}
> or
> {noformat}
> tag-x=x-val
> tag-y=y-val
> {noformat}
> h3.ManagedSnitch
> This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The
> user should be able to manage the tags and values of each node through a
> collection API
> h2.Placement
> This tells how many replicas for a given shard needs to be assigned to nodes
> with the given key value pairs. These parameters will be passed on to the
> collection CREATE api as a parameter "placement" . The values will be saved
> in the state of the collection as follows
> {code:Javascript}
> {
> “mycollection”:{
> “snitch”: {
> type:“EC2Snitch”
> }
> “placement”:{
> “key1”: “value1”,
> “key2”: “value2”,
> }
> }
> {code}
> A rule consists of 2 parts
> * LHS or the qualifier : The format is \{shardname}.\{replicacount} . Use
> the wild card “*” for qualifying all
> * RHS or conditions : The format is \{tagname}\{operand}\{value} . The tag
> name and values are provided by the snitch. The supported operands are
> ** -> : equals
> ** > : greater than . Only applicable for numeric tags
> ** < : less than , Only applicable to numeric tags
> ** ! : NOT or not equals
> Each collection can have any number of rules. As long as the rules do not
> conflict with each other it should be OK. Or else an error is thrown
> Example rules:
> * “shard1.1”:“dc->dc1&rack->168” : This would assign exactly 1 replica for
> shard1 with nodes having tags “dc=dc1,rack=168”.
> * “shard1.1+”:“dc->dc1&rack->168” : Same as above but assigns atleast one
> replica to the tag val combination
> * “*.1”:“dc->dc1” : For all shards keep exactly one replica in dc:dc1
> * “*.1+”:”dc->dc2” : At least one replica needs to be in dc:dc2
> * “*.2-”:”dc->dc3” : Keep a maximum of 2 replicas in dc:dc3 for all shards
> * “shard1.*”:”rack->730” : All replicas of shard1 will go to rack 730
> * “shard1.1”:“node->192.167.1.2:8983_solr” : 1 replica of shard1 must go to
> the node 192.167.1.28983_solr
> * “shard1.* : “rack!738” : No replica of shard1 should go to rack 738
> * “shard1.* : “host!192.168.89.91” : No replica of shard1 should go to host
> 192.168.89.91
> * “*.*”: “cores<5”: All replicas should be created in nodes with less than 5
> cores
> * “*.*”:”disk>20gb” : All replicas must be created in nodes with disk space
> greater than 20gb
> In the collection create API all the placement rules are provided as a
> parameter called placement and multiple rules are separated with "|"
> example:
> {noformat}
> snitch.type=EC2Snitch&placement=*.1:dc->dc1|*.2-:dc->dc3|!shard1.*:rack->738
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]