[ 
https://issues.apache.org/jira/browse/SOLR-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454244#comment-17454244
 ] 

Houston Putman commented on SOLR-15803:
---------------------------------------

So another fairly significant change to the {{LegacyAssignStrategy}} 
implementation I have made. Previously if there were multiple shards included 
in a single AssignRequest (e.g. assign 2 replicas each of shard1 and shard2 in 
collection foo), an ordering of desired nodes were first made, then replicas 
were added to each node (0, 1 , 2) in the ordering. This node index counter did 
not reset between shards, so for the example I gave earlier using 3 live nodes, 
replicas would be assigned the following node orders:
 * shard-1-replica-1 -> node order 0
 * shard-1-replica-2 -> node order 1
 * shard-2-replica-1 -> node order 2
 * shard-2-replica-2 -> node order 0

This works fairly well if the cluster is empty, however it does not necessarily 
provide a good heuristic when the replicas are already unbalanced across the 
nodes.

Instead, in the new logic, I have decided to re-order the node list before 
assigning replicas for each shard. So in the above example, shard1-replica1 and 
shard1-replica2 would be assigned, then the node lists will be re-sorted 
(including the information for where shard1-replica1 and shard1-replica2 live). 
Then shard2-replica1 and shard2-replica2 will be assigned based on the new 
ordering of nodes.

This does add computation time when, for example, creating a collection, 
however it does guarantee a better structuring of replicas across the nodes 
(when not using a autoscaling plugin)

Another change that has made it in here (necessary for the ReplaceNode command, 
since we use a nodeList that does not contain the source node): If the 
LegacyAssignStrategy is passed a list of nodes, previously it would use the 
given list as an "ordering" to use. Instead the list is now sorted according to 
the same logic used if no nodeList is provided (number of cores on the node, 
with a large weight used for cores of the same collection).

> Allow AssignStrategy to process multiple AssignRequests with 
> cross-coordination
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-15803
>                 URL: https://issues.apache.org/jira/browse/SOLR-15803
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Houston Putman
>            Assignee: Houston Putman
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When doing testing for SOLR-15795, I found that if you have an empty node 
> when running the REPLACENODE command, then many times all replicas will be 
> placed on that same node, even if it doesn't result in an even distribution 
> in your cluster.
> When looking at the code, it made sense. The ReplaceNodeCmd goes through a 
> loop for every replica on the sourceNode, and uses the AssignStrategy class 
> to assign a node for each replica, using the clusterstate. However, the 
> clusterstate does not change between these replicas, so the most advantageous 
> node for 1 replica, is likely going to be the most advantageous for many 
> replicas given the same cluster state. Therefore all replicas were being 
> scheduled for the same node in my testing.
> An easy (in theory) solution is to let AssignStrategy take a list of 
> AssignRequests in assign(), and each request in this list will account for 
> the replicaPlacements decided for the previous requests in the list. That 
> way, the ReplaceNodeCmd can create it's list of AssignRequests, and issue 
> them all at once to AssignStrategy, which will come up with the _optimal_ 
> plan for all replicas *together*.
> Because this is an API in assignStrategy, it will work with the new 
> autoscaling APIs or using the legacy assign strategy.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to