[
https://issues.apache.org/jira/browse/SOLR-16722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706151#comment-17706151
]
Jan Høydahl edited comment on SOLR-16722 at 3/29/23 11:56 AM:
--------------------------------------------------------------
Another obvious option would be for the Solr Operator to simply remove the
{{live_nodes}} entry for the node it wants to drain. But I intuitively rejected
it, assuming it would muddy the waters around what {{live_nodes}} means and who
manages it.
However, what if a Solr node, during its own shutdown logic, would un-publish
itself in {{live_nodes}} early, and then sleep for a configurable time (default
3s?) before actually stopping each core. That would give SolrJ time to update
the zk watch and take the node out of circulation. I have not checked the
code. How long delay is there today between un-publish in zk and actual
shutdown of cores?
External load balancers could also update their URL list by watching
{{live_nodes}}.
was (Author: janhoy):
Another obvious option would be for the Solr Operator to simply remove the
{{live_nodes}} entry for the node it wants to drain. But I intuitively rejected
it, assuming it would muddy the waters around what {{live_nodes}} means.
I also rejected a thought of attaching a child node or json content to existing
{{live_nodes/foo}} since that would add a need for more watches(?)
> API to flag a solr node NOT READY for requests
> ----------------------------------------------
>
> Key: SOLR-16722
> URL: https://issues.apache.org/jira/browse/SOLR-16722
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Jan Høydahl
> Priority: Major
>
> Spinoff from solr operator PR
> [https://github.com/apache/solr-operator/issues/529]
> When solr-operator performs a rolling restart or rolling upgrade, it will
> stop one node at a time, but SolrJ (both external and internal) will continue
> sending traffic to the node until requests start failing, since at the time
> SolrJ picks up the "live_nodes" change, it is too late.
> While the operator PR mentioned above will prevent external requests through
> the k8s service to the draining node, it will not prevent internal traffic.
> This issue thus aims to introduce some API or mechanism to flag a Solr node
> as NOT READY for traffic.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]