[
https://issues.apache.org/jira/browse/SOLR-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joshua Humphries updated SOLR-10277:
------------------------------------
Hey, Varun,
I actually just got it into our production cluster yesterday. It reduced
the average time to restart a node (just under 50 nodes) and have it fully
active (e.g. overseer queue go quiet, all collections updated) from about
2.5 minutes down to 55 seconds. Our tools to examine and verify cluster
states also show that everything looked good.
----
Josh Humphries
FullStory <https://www.fullstory.com/> | Atlanta, GA
Software Engineer
[email protected]
On Fri, Mar 31, 2017 at 2:04 PM, Varun Thacker (JIRA) <[email protected]>
> On 'downnode', lots of wasteful mutations are done to ZK
> --------------------------------------------------------
>
> Key: SOLR-10277
> URL: https://issues.apache.org/jira/browse/SOLR-10277
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 5.5.3, 5.5.4, 6.0.1, 6.2.1, 6.3, 6.4.2
> Reporter: Joshua Humphries
> Assignee: Scott Blum
> Labels: leader, zookeeper
> Attachments: SOLR-10277-5.5.3.patch, SOLR-10277.patch
>
>
> When a node restarts, it submits a single 'downnode' message to the
> overseer's state update queue.
> When the overseer processes the message, it does way more writes to ZK than
> necessary. In our cluster of 48 hosts, the majority of collections have only
> 1 shard and 1 replica. So a single node restarting should only result in
> ~1/40th of the collections being updated with new replica states (to indicate
> the node that is no longer active).
> However, the current logic in NodeMutator#downNode always updates *every*
> collection. So we end up having to do rolling restarts very slowly to avoid
> having a severe outage due to the overseer having to do way too much work for
> each host that is restarted. And subsequent shards becoming leader can't get
> processed until the `downnode` message is fully processed. So a fast rolling
> restart can result in the overseer queue growing incredibly large and nearly
> all shards winding up in a leader-less state until that backlog is processed.
> The fix is a trivial logic change to only add a ZkWriteCommand for
> collections that actually have an impacted replica.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]