[
https://issues.apache.org/jira/browse/SOLR-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030666#comment-16030666
]
Erick Erickson commented on SOLR-10780:
---------------------------------------
As the original author of all that REGALANCELEADERS stuff, I'll be happy to see
it go away, it's always been arcane ;)....
The intent of the original was to prevent 100s of leaders being on the same
Solr instance in cases where there were many, many shards spread across many
machines and each machine would host a replica of each shard. In that case
measurable performance degradation happened because, even though the extra work
for the leader wasn't onerous, the cumulative extra work was.
And since there is no use for BALANCESHARDUNIQUE other than preferredLeader
(that I know of), this and the REBALANCELEADERS API commands are overkill.
I think the intent of this functionality can be implemented much more simply.
When a replica comes up and after it becomes active, if it examines the state
of the collection and notes "too many" leaders on a particular node, if could
simply request that it become the leader of its shard.
By waiting until it's active, we should avoid conditions where a replica wants
to become the leader but hasn't synced.
I think this is quite legitimate as part of the general autoscaling effort, the
time is now.
Let's say I have 100 nodes, 100 shards and 100 replicas/shard. That is, each
node hosts one replica for each shard. Now I run around and start up all the
nodes. How do we keep from unnecessary leadership changes? Maybe throttle this
somehow?
Or two replicas for the same shard request leadership at the same time....
Or is this the Overseer's job? Something like a "balancing thread" that notices
this condition and sends "you should be leader" messages to particular
replicas. Or something that has a global view of what's happening cluster wide
(as yet undefined)...
> A new collection property autoRebalanceLeaders
> -----------------------------------------------
>
> Key: SOLR-10780
> URL: https://issues.apache.org/jira/browse/SOLR-10780
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: Noble Paul
>
> In solrcloud , the first replica to get started in a given shard becomes the
> leader of that shard. This is a problem during cluster restarts. the first
> node to get started have al leaders and that node ends up being very heavily
> loaded. The solution we have today is to invoke a REBALANCELEADERS command
> explicitly so that the system ends up with a uniform distribution of leaders
> across nodes. This is a manual operation and we can make the system do it
> automatically.
> so each collection can have an {{autoRebalanceLeaders}} flag . If it is set
> to true whenever a replica becomes {{ACTIVE}} in a shard , a
> {{REBALANCELEADER}} is invoked for that shard
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]