[ 
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925821#comment-15925821
 ] 

Tim Owen commented on SOLR-7191:
--------------------------------

Admittedly not thousands of collections, but another anecdote. Each of our 
clusters are 12 hosts running 6 nodes each, with 165 collections of 16 shards 
each, 3x replication. So around 7900 cores spread over 72 nodes (roughly 100 
each).

To get stable restarts we throttle the recovery thread pool size, see ticket I 
raised with our patch, SOLR-9936 - without that, the amount of recovery just 
kills the network and disks and the cluster status never settles.

Also we avoid restarting all nodes at once, we bring up a few at a time and 
wait for their recovery to finish before starting more. We need to automate 
this, e.g. using a Zookeeper lock pool so that nodes will wait to startup.

> Improve stability and startup performance of SolrCloud with thousands of 
> collections
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-7191
>                 URL: https://issues.apache.org/jira/browse/SOLR-7191
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.0
>            Reporter: Shawn Heisey
>            Assignee: Noble Paul
>              Labels: performance, scalability
>             Fix For: 6.3
>
>         Attachments: lots-of-zkstatereader-updates-branch_5x.log, 
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, 
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3, 
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many 
> problems myself even before I was able to get 4000 collections created on a 
> 5.0 example cloud setup.  Restarting Solr takes a very long time, and it is 
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance 
> and scalability.  It doesn't help that I'm running both Solr nodes on one 
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to