[
https://issues.apache.org/jira/browse/SOLR-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366690#comment-15366690
]
Erick Erickson commented on SOLR-7280:
--------------------------------------
bq: I don't think it takes a weird topology - just more replicas than thread to
load them in a shard.
OK, I think I see what you're saying. You're talking about a "deep" topology,
i.e. one with many replicas on a particular shard on a particular instance and
I was looking at a "wide" topology, many collections per instance but each
shard had only a few replicas. I've seen both in the field as I'm sure you
have....
How much of both situations would be handled by creating an ordered list of all
replicas that were leaders and loading those first then loading an ordered list
of all replicas that weren't labeled as leader? There's still the case of a
zillion leaders on a single instance, so some heuristic like you suggest seems
to be in order.
I'll emphasize though that the current code (without this patch) can prevent a
cluster from coming up at _all_. With this patch the cluster at least comes up,
albeit slowly if the leaderVoteWait comes into play. Bumping the number of
threads can to > the max replicas for a shard can handle the case you mentioned
while keeping it "reasonable" can deal with the one I'm seeing.
That said, I think the default should be quite high in the cloud case so we
don't change the current behavior and let situations like I'm seeing deal with
configuring this. I think it defaults to 8 currently, perhaps 100 (or
unlimited) instead in cloud mode?
How much of all of the above makes this patch "good enough for now" with
perhaps follow-ons on more sophisticated approaches?
> Load cores in sorted order and tweak coreLoadThread counts to improve cluster
> stability on restarts
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-7280
> URL: https://issues.apache.org/jira/browse/SOLR-7280
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Shalin Shekhar Mangar
> Assignee: Noble Paul
> Fix For: 5.2, 6.0
>
> Attachments: SOLR-7280.patch
>
>
> In SOLR-7191, Damien mentioned that by loading solr cores in a sorted order
> and tweaking some of the coreLoadThread counts, he was able to improve the
> stability of a cluster with thousands of collections. We should explore some
> of these changes and fold them into Solr.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]