[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205251#comment-15205251
 ] 

Alan Woodward commented on SOLR-8862:
-------------------------------------

I've tried to dig a bit and see when everything here is run within the Jetty 
lifecycle, and it turns out that... it's complicated!
* In a normal Solr setup, running using the Jetty start.jar, the 
SolrDispatchFilter is instantiated during startup (Jetty instantiates its 
Filters, and then its Servlets), and it won't serve any requests until all 
filters and servlets are fully constructed and have finished initialising.  So 
there could be a significant gap between registering the live_nodes znode and 
requests actually being served, particularly if there are other servlets within 
the container that take their time in starting up.
* In JettySolrRunner, the SDF is instantiated within a jetty LifecycleListener 
(of which more below), which is called *after* Jetty has started listening on 
its port.  Requests won't be served via the filter until it has finished 
instantiating, but the gap here is smaller.

In both cases we have a race.  Ideally, we want to instatiate the filters, and 
only register ourselves with the cluster once we know we're serving requests, 
so we need a way to be notified that everything is ready to go:
* The standard servlet API exposes ServletContextListeners, but these only get 
called *before* startup and shutdown, so these aren't any use.  We need to be 
notified *after* startup.
* Jetty allows you to register LifecycleListeners that get called before and 
after startup and shutdown, which is exactly what we want.  Hurrah!

So what we really need to do here is to separate out CoreContainer 
construction, loading of cores, and creation of the live_nodes znode.  The 
container should be constructed and load up during server startup, and then 
register itself in a LifecycleListener.

It's not ideal that we have two different code paths here, one for 'proper' 
solr running using start.jar and xml configuration, and one programmatically, 
but I guess we can live with that for a while.

On a separate note, SOLR-8323 should help with waiting for collections to be 
searchable.

> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8862
>                 URL: https://issues.apache.org/jira/browse/SOLR-8862
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
>     configureCluster(numServers)
>       .addConfig(configName, configDir.toPath())
>       .configure();
>     Map<String, String> collectionProperties = ...;
>     assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>                                            configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to