[jira] [Comment Edited] (SOLR-12386) Test fails for "Can't find resource" for files in the _default configset

Mark Robert Miller (Jira) Mon, 19 Jul 2021 19:50:03 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383710#comment-17383710
 ]


Mark Robert Miller edited comment on SOLR-12386 at 7/20/21, 2:49 AM:
---------------------------------------------------------------------

{quote}Retries and attempts from everyone to create core zk nodes
{quote}
I should illustrate that a bit.

Lets say you start up 100 solr servers for a nice new cluster. Or an existing 
cluster.

Generally, what is going to happen is that every Solr instance is going to hit 
ZK and do something like, ensure /configs exists. And makePath on 
/path1/path2/path3. Which may be 3 calls, just in case path1 and path2 don't 
yet exist. So what essentially happens, is that all the time, we have 100 solr 
servers try / retrying to make the same paths, the same existing or about to 
exist path parts, etc. Racing each other to get those same path parts in for a 
path.

So we do something like boot up a new cluster, and the zk base layout could 
maybe be created with, let's say, 15 zk calls. And maybe we make 900 
(*generously* conservative). And 100's more on a restart for nodes that are 
created on day 1, instant 1. Maybe we recreate nodes someone / some process 
just tried to delete in this process. Since independent things in lots of 
random places are trying to ensure nodes exist (that should be one and done on 
first startup, or collection create, etc), maybe you end up will all kinds of 
zk calls from all these servers even at random times outside startup, restart, 
collection create. When you could simply have a one time instance of, create 
these dozen paths, one client says it, it's done, case closed forever more.


was (Author: markrmiller):
{quote}Retries and attempts from everyone to create core zk nodes
{quote}
I should illustrate that a bit.

Lets say you start up 100 solr servers for a nice new cluster. Or an existing 
cluster.

Generally, what is going to happen is that every Solr instance is going to hit 
ZK and do something like, ensure /configs exists. And makePath on 
/path1/path2/path3. Which may be 3 calls, just in case path1 and path2 don't 
yet exist. So what essentially happens, is that all the time, we have 100 solr 
servers try / retrying to make the same paths, the same existing or about to 
exist path parts, etc. Racing each other to get those same path parts in for a 
path.

So we do something like boot up a new cluster, and the zk base layout could 
maybe be created with, let's say, 15 zk calls. And maybe we make 900. And 100's 
more on a restart for nodes that are created on day 1, instant 1. Maybe we 
recreate nodes someone / some process just tried to delete in this process. 
Since independent things in lots of random places are trying to ensure nodes 
exist (that should be one and done on first startup, or collection create, 
etc), maybe you end up will all kinds of zk calls from all these servers even 
at random times outside startup, restart, collection create. When you could 
simply have a one time instance of, create these dozen paths, one client says 
it, it's done, case closed forever more.

> Test fails for "Can't find resource" for files in the _default configset
> ------------------------------------------------------------------------
>
>                 Key: SOLR-12386
>                 URL: https://issues.apache.org/jira/browse/SOLR-12386
>             Project: Solr
>          Issue Type: Test
>          Components: SolrCloud
>            Reporter: David Smiley
>            Priority: Minor
>         Attachments: cant find resource, stacktrace.txt
>
>
> Some tests, especially ConcurrentCreateRoutedAliasTest, have failed 
> sporadically failed with the message "Can't find resource" pertaining to a 
> file that is in the default ConfigSet yet mysteriously can't be found.  This 
> happens when a collection is being created that ultimately fails for this 
> reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-12386) Test fails for "Can't find resource" for files in the _default configset

Reply via email to