[
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123800#comment-13123800
]
Mark Miller commented on SOLR-2765:
-----------------------------------
The current method of dealing with downed nodes is not so bad - the cluster
layout is compared with the live_nodes - this gives searchers the ability to
know a node is down within the ephemeral timeout. Before that happens (a brief
window), failed requests are simply retried on another replica. The searcher
locally marks that the server is bad, and then periodically tries it again -
unless the ephemeral goes down and it is no longer consulted.
bq. The client cannot derive this information accurately from simple liveness
information.
It's simply not supported that way currently - this is intentional though. If
you want to change which shards a node is responsible for serving, you don't
just bring it back up with fewer or different shards - you first delete the
node info from the cluster layout, then you bring it up. We didn't mind that a
variety of advanced scenarios require manual editing of the zk layout at the
time. We have intended to move towards a separate model and state layout
eventually though (see the solrcloud wiki page). That is essentially in the
proposed path I think.
I bias-ly lean against an overseer almost more than optimistic collection
locks, but I have not had time to fully digest the latest proposed changes. I
suppose that when you have a solid leader election process available, an
overseer is fairly cheap, and if used for the right things, fairly simple. When
we get into rebalancing (we don't plan to right away), I suppose we come back
to it anyhow.
bq. marking replicas as defunct might do,
Yeah, I think this gets complicated to do well in general. I like simple
solutions like the one above. And I think good monitoring is a perfectly
acceptable requirement for a very large cluster.
It's good stuff to consider. Exploring all of these changes should likely be
spun off into anther issue though. Advancements in how we handle all of this
are a much larger issue than Shard/Node states.
> Shard/Node states
> -----------------
>
> Key: SOLR-2765
> URL: https://issues.apache.org/jira/browse/SOLR-2765
> Project: Solr
> Issue Type: Sub-task
> Components: SolrCloud, update
> Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: combined.patch, incremental_update.patch,
> scheduled_executors.patch, shard-roles.patch
>
>
> Need state for shards that indicate they are recovering, active/enabled, or
> disabled.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]