[
https://issues.apache.org/jira/browse/SOLR-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483920#comment-14483920
]
Timothy Potter commented on SOLR-7361:
--------------------------------------
bq. Isn't it a problem that live_nodes is set up but the cores aren't, only
when the core isn't marked down?
You're right about requests being sent to cores, but I think the problem here
is admin type requests, such as those coming from the overseer to create a new
collection. This is actually how we found this in a big cluster after a
restart, some cores were slow to load (due the the suggester dictionary issue)
and create collection requests started to fail.
In general, I don't think all cores should be blocked from being accessed until
the slowest core is loaded, so I'm thinking we need to re-think how cores are
loaded in the background in cloud mode.
> Main Jetty thread blocked by core loading delays HTTP listener from binding
> if core loading is slow
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-7361
> URL: https://issues.apache.org/jira/browse/SOLR-7361
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Timothy Potter
>
> During server startup, the CoreContainer uses an ExecutorService to load
> cores in multiple back-ground threads but then blocks until cores are loaded,
> see: CoreContainer#load around line 290 on trunk (invokeAll). From the
> JavaDoc on that method, we have:
> {quote}
> Executes the given tasks, returning a list of Futures holding their status
> and results when all complete. Future.isDone() is true for each element of
> the returned list.
> {quote}
> In other words, this is a blocking call.
> This delays the Jetty HTTP listener from binding and accepting requests until
> all cores are loaded. Do we need to block the main thread?
> Also, prior to this happening, the node is registered as a live node in ZK,
> which makes it a candidate for receiving requests from the Overseer, such as
> to service a create collection request. The problem of course is that the
> node listed in /live_nodes isn't accepting requests yet. So we either need to
> unblock the main thread during server loading or maybe wait longer before we
> register as a live node ... not sure which is the better way forward?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]