[
https://issues.apache.org/jira/browse/SOLR-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135171#comment-14135171
]
Ramkumar Aiyengar commented on SOLR-6491:
-----------------------------------------
These are the concerns with the leadership mechanism as it stands currently,
with no balancing (which would result in leaders all ganging up on one set of
machines). I am talking based on experience with a NRT system with a fairly
high rate of indexing, very low commit interval, and hundreds of shards (50+ on
each machine).
* The biggest performance issue is not during indexing normally but when some
replicas are recovering. In such a case, the machines with leaders have to
service around 50+ IO intensive recovery operations, indexing can really take a
hit during this time (we have seen indexing latency increase by a few times).
** SOLR-6485 somewhat improves this situation, but is a compromise really, it
increases the time taken for recovery when you could really spread the IO load
on different machines, doesn't help prevent "spikiness" (you hit IO hard for a
few 100ms, and then stay quiet for a few 100ms more), and is risky in a cloud
environment because recovery can be spontaneous (say, a ZK disconnect) -- in
such a case, the system is already vulnerable due to unplanned, reduced
capacity and this prolongs that situation.
* Overseer is hit harder when a machine with leaders dies, or goes down, or if
there's a ZK expiry on a Solr instance with all cores being leaders. You have a
lot more elections happening at the same time, and despite various improvements
done to Overseer recently, it's finally bound as well by how fast ZK can
respond. This in turn impacts the amount of time replicas find themselves
without noticing a leader and hence ingestion slows down considerably.
** A lesser case of this is when an instance encounters a ZK expiry, you need
to re-elect each one of the cores in it if all the leaders gang up in one place.
* If the machine containing the leaders dies, then there's a ephemeral node
timeout which would affect indexing in general even before elections kick in.
This is a lot worse (affects a lot more documents) if leadership is
concentrated on a machine.
* Even if instances on a 'leader' machine are orderly shutting down, there's a
time delay between the instance shutting down and the instance losing it's
leadership because of the servlet model we are currently tied to (the container
first refuses connections, then gets the servlet to deal with it). Having
leaders in one place leads to more documents being affected by this. I agree
this however could potentially be solved by other mechanisms, for example, by
having a different handler which forces cores to let go of leadership, which is
called by a script prior to shutdown, or ideally, by getting rid of the servlet
model as the long term plan is..
> Add preferredLeader as a ROLE and a collections API command to respect this
> role
> --------------------------------------------------------------------------------
>
> Key: SOLR-6491
> URL: https://issues.apache.org/jira/browse/SOLR-6491
> Project: Solr
> Issue Type: Improvement
> Affects Versions: 4.11, 5.0
> Reporter: Erick Erickson
> Assignee: Erick Erickson
>
> Leaders can currently get out of balance due to the sequence of how nodes are
> brought up in a cluster. For very good reasons shard leadership cannot be
> permanently assigned.
> However, it seems reasonable that a sys admin could optionally specify that a
> particular node be the _preferred_ leader for a particular collection/shard.
> During leader election, preference would be given to any node so marked when
> electing any leader.
> So the proposal here is to add another role for preferredLeader to the
> collections API, something like
> ADDROLE?role=preferredLeader&collection=collection_name&shard=shardId
> Second, it would be good to have a new collections API call like
> ELECTPREFERREDLEADERS?collection=collection_name
> (I really hate that name so far, but you see the idea). That command would
> (asynchronously?) make an attempt to transfer leadership for each shard in a
> collection to the leader labeled as the preferred leader by the new ADDROLE
> role.
> I'm going to start working on this, any suggestions welcome!
> This will subsume several other JIRAs, I'll link them momentarily.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]