[ 
https://issues.apache.org/jira/browse/SOLR-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519668#comment-16519668
 ] 

Jerry Bao edited comment on SOLR-12495 at 6/21/18 6:40 PM:
-----------------------------------------------------------

{quote}well
{code:java}
{"replica": "#MINIMUM", "node": "#ANY"}
{code}
means it is applied on a per collection basis
{quote}
That seems confusing to me; the way I read it is: keep a minimum number of 
replicas on every node. Just to clarify, when you say per-collection basis, 
you're meaning each collection is balanced? If that is so will there be a way 
to keep the entire cluster balanced irrespective of collection? Is that covered 
by the core preference? My concern here is that without a way to keep the 
entire cluster balanced irrespective of collection, you'll end up with nodes 
with one replica of every collection and other nodes with 0 replicas. For 
example, if you had three collections with 30 replicas each, and 45 nodes, you 
could end up with 30 nodes, each with one of each collections replica, and 15 
nodes with 0 replicas, which is unbalanced.
{quote}In reality, it works slightly different. The value "<3" is not a 
constant . it keeps varying when every replica is created. for instance , when 
replica # 40 is being created , the value is (40/40 = 1) that is like saying 
{{replica:"<2"}} . whereas , when replica #41 is created, it suddenly becomes 
{{"replica" : "<3"}}. So actually allocations happen evenly
{quote}
I understand that it's not constant, but what I'm saying is the rule itself can 
not be violated but the cluster not balanced. If I have 42 replicas and 40 
nodes, I would want 1 replica on every node before getting 2 on other nodes. 
ceil(42/40) -> <3 rule, which has the potential of having 2 replicas on 21 
nodes, which satisfies the rule but is not balanced.


was (Author: jerry.bao):
{quote}well
{code:java}
{"replica": "#MINIMUM", "node": "#ANY"}
{code}
means it is applied on a per collection basis
{quote}
That seems confusing to me; the way I read it is: keep a minimum number of 
replicas on every node. Just to clarify, when you say per-collection basis, 
you're meaning each collection is balanced? If that is so will there be a way 
to keep the entire cluster balanced irrespective of collection? Is that covered 
by the core preference? My concern here is that without a way to keep the 
entire cluster balanced irrespective of collection, you'll end up with nodes 
with one replica of every collection and other nodes with 0 replicas. For 
example, if you had three collections with 30 replicas each, and 45 nodes, you 
could end up with 30 nodes, each with one collections replica, and 15 nodes 
with 0 replicas, which is unbalanced.
{quote}In reality, it works slightly different. The value "<3" is not a 
constant . it keeps varying when every replica is created. for instance , when 
replica # 40 is being created , the value is (40/40 = 1) that is like saying 
{{replica:"<2"}} . whereas , when replica #41 is created, it suddenly becomes 
{{"replica" : "<3"}}. So actually allocations happen evenly
{quote}
I understand that it's not constant, but what I'm saying is the rule itself can 
not be violated but the cluster not balanced. If I have 42 replicas and 40 
nodes, I would want 1 replica on every node before getting 2 on other nodes. 
ceil(42/40) -> <3 rule, which has the potential of having 2 replicas on 21 
nodes, which satisfies the rule but is not balanced.

> Make it possible to evenly distribute replicas
> ----------------------------------------------
>
>                 Key: SOLR-12495
>                 URL: https://issues.apache.org/jira/browse/SOLR-12495
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: AutoScaling
>            Reporter: Noble Paul
>            Priority: Major
>
> Support a new function value for {{replica= "#MINIMUM"}}
> {{#MINIMUM}} means the minimum computed value for the given configuration
> the value of replica will be calculated as  {{<= 
> Math.ceil(number_of_replicas/number_of_valid_nodes) }}
> *example 1:*
> {code:java}
> {"replica" : "#MINIMUM" , "shard" : "#EACH" , "node" : "#ANY"}
> {code}
> *case 1* : nodes=3, replicationFactor=4
>  the value of replica will be calculated as {{Math.ceil(4/3) = 2}}
> current state : nodes=3, replicationFactor=2
> this is equivalent to the hard coded rule
> {code:java}
> {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"}
> {code}
> *case 2* : 
> current state : nodes=3, replicationFactor=2
> this is equivalent to the hard coded rule
> {code:java}
> {"replica" : "<3" , "shard" : "#EACH" , "node" : "#ANY"}
> {code}
> *example:2*
> {code}
> {"replica" : "#MINIMUM"  , "node" : "#ANY"}{code}
> case 1: numShards = 2, replicationFactor=3, nodes = 5
> this is equivalent to the hard coded rule
> {code:java}
> {"replica" : "<3" , "node" : "#ANY"}
> {code}
> *example:3*
> {code}
> {"replica" : "<2"  , "shard" : "#EACH" , "port" : "8983"}{code}
> case 1: {{replicationFactor=3, nodes with port 8983 = 2}}
> this is equivalent to the hard coded rule
> {code}
> {"replica" : "<3"  , "shard" : "#EACH" , "port" : "8983"}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to