[ 
https://issues.apache.org/jira/browse/SOLR-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367410#comment-15367410
 ] 

Noble Paul commented on SOLR-7280:
----------------------------------

Had a chat with [~shalinmangar] and came up with the following design.

h4. Objectives

* Move away from the current design of infinite number of threads for core 
loads which leads to OOM or other issues
* Avoid the leaderVoitWait problem which leads to shards with no leader for a 
long time or even (down shards)

Blindly sorting cores based on replica names is not foolproof. It can lead to 
deadlocks depending on how the replicas are distributed. The sorting logic 
could be as follows.

h5. Core Sorting logic
When a node comes up, it reads the list of live nodes and the states  of each 
collection it hosts. Construct a List of shards {{collectionName+shardName}} it 
hosts sorted by the (no:of replicas for that shard in other started nodes + 
no:of replicas present in the current node for that replica) . Break the tie by 
sorting the name in alphabetic {{collectionName+shardName}}  order. This 
ensures that no other node is waiting for some replica in this node to be up.

h5. Thread count
The default no:of {{coreLoadThreads}} should be much higher for SolrCloud 
(Maybe 50 ?). The user should be able to override the value by explicitly 
configuring it. 
 


> Load cores in sorted order and tweak coreLoadThread counts to improve cluster 
> stability on restarts
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7280
>                 URL: https://issues.apache.org/jira/browse/SOLR-7280
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Noble Paul
>             Fix For: 5.2, 6.0
>
>         Attachments: SOLR-7280.patch
>
>
> In SOLR-7191, Damien mentioned that by loading solr cores in a sorted order 
> and tweaking some of the coreLoadThread counts, he was able to improve the 
> stability of a cluster with thousands of collections. We should explore some 
> of these changes and fold them into Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to