[
https://issues.apache.org/jira/browse/SOLR-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383710#comment-17383710
]
Mark Robert Miller edited comment on SOLR-12386 at 7/20/21, 2:49 AM:
---------------------------------------------------------------------
{quote}Retries and attempts from everyone to create core zk nodes
{quote}
I should illustrate that a bit.
Lets say you start up 100 solr servers for a nice new cluster. Or an existing
cluster.
Generally, what is going to happen is that every Solr instance is going to hit
ZK and do something like, ensure /configs exists. And makePath on
/path1/path2/path3. Which may be 3 calls, just in case path1 and path2 don't
yet exist. So what essentially happens, is that all the time, we have 100 solr
servers try / retrying to make the same paths, the same existing or about to
exist path parts, etc. Racing each other to get those same path parts in for a
path.
So we do something like boot up a new cluster, and the zk base layout could
maybe be created with, let's say, 15 zk calls. And maybe we make 900
(*generously* conservative). And 100's more on a restart for nodes that are
created on day 1, instant 1. Maybe we recreate nodes someone / some process
just tried to delete in this process. Since independent things in lots of
random places are trying to ensure nodes exist (that should be one and done on
first startup, or collection create, etc), maybe you end up will all kinds of
zk calls from all these servers even at random times outside startup, restart,
collection create. When you could simply have a one time instance of, create
these dozen paths, one client says it, it's done, case closed forever more.
was (Author: markrmiller):
{quote}Retries and attempts from everyone to create core zk nodes
{quote}
I should illustrate that a bit.
Lets say you start up 100 solr servers for a nice new cluster. Or an existing
cluster.
Generally, what is going to happen is that every Solr instance is going to hit
ZK and do something like, ensure /configs exists. And makePath on
/path1/path2/path3. Which may be 3 calls, just in case path1 and path2 don't
yet exist. So what essentially happens, is that all the time, we have 100 solr
servers try / retrying to make the same paths, the same existing or about to
exist path parts, etc. Racing each other to get those same path parts in for a
path.
So we do something like boot up a new cluster, and the zk base layout could
maybe be created with, let's say, 15 zk calls. And maybe we make 900. And 100's
more on a restart for nodes that are created on day 1, instant 1. Maybe we
recreate nodes someone / some process just tried to delete in this process.
Since independent things in lots of random places are trying to ensure nodes
exist (that should be one and done on first startup, or collection create,
etc), maybe you end up will all kinds of zk calls from all these servers even
at random times outside startup, restart, collection create. When you could
simply have a one time instance of, create these dozen paths, one client says
it, it's done, case closed forever more.
> Test fails for "Can't find resource" for files in the _default configset
> ------------------------------------------------------------------------
>
> Key: SOLR-12386
> URL: https://issues.apache.org/jira/browse/SOLR-12386
> Project: Solr
> Issue Type: Test
> Components: SolrCloud
> Reporter: David Smiley
> Priority: Minor
> Attachments: cant find resource, stacktrace.txt
>
>
> Some tests, especially ConcurrentCreateRoutedAliasTest, have failed
> sporadically failed with the message "Can't find resource" pertaining to a
> file that is in the default ConfigSet yet mysteriously can't be found. This
> happens when a collection is being created that ultimately fails for this
> reason.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]