[
https://issues.apache.org/jira/browse/SOLR-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919350#comment-13919350
]
Ramkumar Aiyengar commented on SOLR-5805:
-----------------------------------------
+1. It would be good to also allow the ability to administratively inform a
node that it's in an unhealthy environment (based off an external health check
or manual intervention), this is something we have been meaning to get to
sometime. Also, depending on the severity, taking out of the mix could just
mean stop being the shard/overseer leader, stop servicing queries but continue
updating indices, or stop all processing.
> SolrCloud: run a healthcheck in a background thread
> ---------------------------------------------------
>
> Key: SOLR-5805
> URL: https://issues.apache.org/jira/browse/SOLR-5805
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Affects Versions: 4.7
> Reporter: Gregg Donovan
>
> From a [discussion|http://search-lucene.com/m/QTPaJeWIM/] on the mailing list:
> We had a brief SolrCloud outage this weekend when a node's SSD began to fail
> but the node still appeared to be up to the rest of the SolrCloud cluster
> (i.e. still green in clusterstate.json). Distributed queries that reached
> this node would fail but whatever heartbeat keeps the node in the
> clusterstate.json must have continued to succeed.
> We eventually had to power the node down to get it to be removed from
> clusterstate.json.
> Mark Miller:
> "One simple improvement might even be a background thread that periodically
> checks some local readings and depending on the results, pulls itself out of
> the mix as best it can (remove itself from clusterstate.json or simply closes
> it’s zk connection)."
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]