[
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123737#comment-13123737
]
Ted Dunning edited comment on SOLR-2765 at 10/9/11 5:46 PM:
------------------------------------------------------------
{quote}
We don't actually currently remove info about a shard when it goes away
{quote}
That is fine.
But there is a strong distinction between different information. I see three
kinds of information about a shard that live in different places and are
intended for different audiences.
- the content and status of collections. This is intended as control input for
the overseer and contains information about which collections exist and what
partitions they contain and other policy information such as requested
replication level. This is the information that should not be deleted
- the shard assignments for nodes. This is generated by the overseer and
intended for the nodes. When a node goes down, the assignments for that node
*should* be deleted and other nodes should get other assignments.
- the cluster state. This is generated by the nodes and the overseer and read
by the search clients to let them know who to send queries to. This is the
file that I was referring to. I was suggesting that the clauses that state
that a particular node is serving requests for a particular shard should be
deleted by the overseer when the node disappears. I can't imagine that there
is any question about this because the clients have an urgent need to stop
sending queries to downed nodes as soon as possible. Deleting a clause from
this state doesn't forget about the shard; it just gives an accurate picture of
who is serving a shard.
I really think that separating the information like this makes it much simpler
to keep track of what is going on, especially for the clients. They should
only need to look one place for what they need to know. Besides, they can't
really reconcile things correctly. If shards A,B,C were all served by node 3,
then when node 3 goes down, the client can get things right. But if node 3
comes back and reloads A but not yet B and C, the client cannot know what the
current state is unless somebody updates the state correctly.
In my view, what should happen when node 3 goes down and then back up is this:
- the ephemeral for node 3 in live_nodes disappears
- the overseer removes the "node 3 has A, B, C" clause from the cluster state
- the overseer assigns shards A, B and C to other nodes to start the
replication process
- node 6 registers that it is serving shard C
- node 5 registers that it is serving shard B
- node 3 comes back, creates the ephemeral in live_nodes and starts updating
the indexes it has from the transaction log.
- node 3 registers that it has shards A and B up-to-date because there have
been no changes.
- the overseer looks at the cluster state and decides that node 3 is
under-utilized and shard B is over replicated. It unassigns shard B from node
5 and assigns shard D to node 3.
- node 5 updates the cluster state to indicate it is no longer serving shard B
- node 3 downloads a copy of shard D and replays the log for D to catch up the
archived index.
- node 3 registers that it is serving D
and so on.
The sequence of states for node 3 in this scenario is
- serving A,B,C
- down (not serving anything)
- up (not serving anything)
- serving A and B
- serving A, B and D
The client cannot derive this information accurately from simple liveness
information. Therefore the node has to update the cluster state.
was (Author: tdunning):
{We don't actually currently remove info about a shard when it goes away}
That is fine.
But there is a strong distinction between different information. I see three
kinds of information about a shard that live in different places and are
intended for different audiences.
- the content and status of collections. This is intended as control input for
the overseer and contains information about which collections exist and what
partitions they contain and other policy information such as requested
replication level. This is the information that should not be deleted
- the shard assignments for nodes. This is generated by the overseer and
intended for the nodes. When a node goes down, the assignments for that node
*should* be deleted and other nodes should get other assignments.
- the cluster state. This is generated by the nodes and the overseer and read
by the search clients to let them know who to send queries to. This is the
file that I was referring to. I was suggesting that the clauses that state
that a particular node is serving requests for a particular shard should be
deleted by the overseer when the node disappears. I can't imagine that there
is any question about this because the clients have an urgent need to stop
sending queries to downed nodes as soon as possible. Deleting a clause from
this state doesn't forget about the shard; it just gives an accurate picture of
who is serving a shard.
I really think that separating the information like this makes it much simpler
to keep track of what is going on, especially for the clients. They should
only need to look one place for what they need to know. Besides, they can't
really reconcile things correctly. If shards A,B,C were all served by node 3,
then when node 3 goes down, the client can get things right. But if node 3
comes back and reloads A but not yet B and C, the client cannot know what the
current state is unless somebody updates the state correctly.
In my view, what should happen when node 3 goes down and then back up is this:
- the ephemeral for node 3 in live_nodes disappears
- the overseer removes the "node 3 has A, B, C" clause from the cluster state
- the overseer assigns shards A, B and C to other nodes to start the
replication process
- node 6 registers that it is serving shard C
- node 5 registers that it is serving shard B
- node 3 comes back, creates the ephemeral in live_nodes and starts updating
the indexes it has from the transaction log.
- node 3 registers that it has shards A and B up-to-date because there have
been no changes.
- the overseer looks at the cluster state and decides that node 3 is
under-utilized and shard B is over replicated. It unassigns shard B from node
5 and assigns shard D to node 3.
- node 5 updates the cluster state to indicate it is no longer serving shard B
- node 3 downloads a copy of shard D and replays the log for D to catch up the
archived index.
- node 3 registers that it is serving D
and so on.
The sequence of states for node 3 in this scenario is
- serving A,B,C
- down (not serving anything)
- up (not serving anything)
- serving A and B
- serving A, B and D
The client cannot derive this information accurately from simple liveness
information. Therefore the node has to update the cluster state.
> Shard/Node states
> -----------------
>
> Key: SOLR-2765
> URL: https://issues.apache.org/jira/browse/SOLR-2765
> Project: Solr
> Issue Type: Sub-task
> Components: SolrCloud, update
> Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: combined.patch, incremental_update.patch,
> scheduled_executors.patch, shard-roles.patch
>
>
> Need state for shards that indicate they are recovering, active/enabled, or
> disabled.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]