Hi, Trying to prevent traffic being sent to a Solr node that is going to shut down, to avoid interruption of service as seen from various clients. First part of the puzzle is signaling to any (external) load balancer to stop sending requests to the node. The other part is having SolrJ understand that the node is being stopped, and not routing internal requests to cores on the node.
Does anyone have a good command of the Shutdown logic in Solr? My understanding is a bit sparse, but here's what I can see in the code: bin/solr stop will send a STOP command to Jetty's STOP_PORT with (not-so-secret) stop key Jetty starts the shutdown process, destroying all servlets and filters, including Solr's dispatchFilter Solr is notified about the shutdown through a callback in CoreContainerProvider. CoreContainerProvider#close() is called which calls CC#shutdown CC shuts down every core on the node and then calls zkController#preClose ZkController#preClose removes ephemeral live_nodes/myNode and then publishes down state in state.json Wait for shutdown of executors mm and let Jetty exit I could have got it wrong though. I was hoping that a Solr node would first publish itself as "not ready" in ZK before rejecting requests, but seems as this is all reversed, since shutdown is initiated by Jetty? So could we instead register our own shutdown-port in Solr, and let our bin/solr script trigger that one? There we could orchestrate the shutdown as we want: Remove live_nodes znode in ZK Publish itself as not ready on api/node/health handler (or a new api/node/ready?) Sleep for a few seconds (or longer with an optional &shutdownDelay argument to our shutdown endpoint) trigger server.stop() to take down Jetty and kill the servlet I filed https://issues.apache.org/jira/browse/SOLR-16722 to discuss a technical solution. The primary goal is to drain traffic right before shutting a node down, but it could also be designed as a generic Readiness Probe <https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes> modeled from Kubernetes? I'm also aware that any solr client should be prepared to hit a dead node due to network/power events, and retry. But it won't hurt to be graceful whenever we can.. Happy to hear your thoughts. Is this a made-up problem? Jan