[
https://issues.apache.org/jira/browse/SOLR-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353247#comment-15353247
]
Scott Blum commented on SOLR-7191:
----------------------------------
This may be unrelated to the current patch work, but seems relevant to the uber
ticket.:
I rebooted our solr cluster the other night to pick up an update, and I ran
into what seemed to be pathological behavior around state updates. My first
attempt to bring up everything at once resulted in utter deadlock, so I shut
everything down, manually nuked all the overseer queues/maps in ZK, and started
bringing them up one at a time. What I saw was kind of astounding.
I was monitoring OVERSEERSTATUS and tracking the number of outstanding overseer
ops + the total number of update_state ops, and I noticed that every VM I
brought up needed ~4000 update_state ops to stabilize, despite the fact that
each VM only manages ~128 cores. We have 32 vms with ~128 cores each, or ~4096
cores in our entire cluster... it took over 100,000 update_state operations to
bring the whole cluster up. That seems... insane. 3 or 4 update_state ops per
core would seem reasonable to me, but I saw over 30 ops per core loaded as I
went. This number was extremely consistent for every node I brought up.
> Improve stability and startup performance of SolrCloud with thousands of
> collections
> ------------------------------------------------------------------------------------
>
> Key: SOLR-7191
> URL: https://issues.apache.org/jira/browse/SOLR-7191
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.0
> Reporter: Shawn Heisey
> Assignee: Shalin Shekhar Mangar
> Labels: performance, scalability
> Attachments: SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch,
> SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch, SOLR-7191.patch,
> lots-of-zkstatereader-updates-branch_5x.log
>
>
> A user on the mailing list with thousands of collections (5000 on 4.10.3,
> 4000 on 5.0) is having severe problems with getting Solr to restart.
> I tried as hard as I could to duplicate the user setup, but I ran into many
> problems myself even before I was able to get 4000 collections created on a
> 5.0 example cloud setup. Restarting Solr takes a very long time, and it is
> not very stable once it's up and running.
> This kind of setup is very much pushing the envelope on SolrCloud performance
> and scalability. It doesn't help that I'm running both Solr nodes on one
> machine (I started with 'bin/solr -e cloud') and that ZK is embedded.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]