[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123737#comment-13123737
 ] 

Ted Dunning edited comment on SOLR-2765 at 10/9/11 5:46 PM:
------------------------------------------------------------

{quote}
We don't actually currently remove info about a shard when it goes away
{quote}
That is fine.

But there is a strong distinction between different information.  I see three 
kinds of information about a shard that live in different places and are 
intended for different audiences.

- the content and status of collections.  This is intended as control input for 
the overseer and contains information about which collections exist and what 
partitions they contain and other policy information such as requested 
replication level.  This is the information that should not be deleted

- the shard assignments for nodes.  This is generated by the overseer and 
intended for the nodes.  When a node goes down, the assignments for that node 
*should* be deleted and other nodes should get other assignments.

- the cluster state.  This is generated by the nodes and the overseer and read 
by the search clients to let them know who to send queries to.  This is the 
file that I was referring to.  I was suggesting that the clauses that state 
that a particular node is serving requests for a particular shard should be 
deleted by the overseer when the node disappears.  I can't imagine that there 
is any question about this because the clients have an urgent need to stop 
sending queries to downed nodes as soon as possible.  Deleting a clause from 
this state doesn't forget about the shard; it just gives an accurate picture of 
who is serving a shard.

I really think that separating the information like this makes it much simpler 
to keep track of what is going on, especially for the clients.  They should 
only need to look one place for what they need to know.  Besides, they can't 
really reconcile things correctly.  If shards A,B,C were all served by node 3, 
then when node 3 goes down, the client can get things right.  But if node 3 
comes back and reloads A but not yet B and C, the client cannot know what the 
current state is unless somebody updates the state correctly.

In my view, what should happen when node 3 goes down and then back up is this:

- the ephemeral for node 3 in live_nodes disappears

- the overseer removes the "node 3 has A, B, C" clause from the cluster state

- the overseer assigns shards A, B and C to other nodes to start the 
replication process

- node 6 registers that it is serving shard C

- node 5 registers that it is serving shard B

- node 3 comes back, creates the ephemeral in live_nodes and starts updating 
the indexes it has from the transaction log.

- node 3 registers that it has shards A and B up-to-date because there have 
been no changes.

- the overseer looks at the cluster state and decides that node 3 is 
under-utilized and shard B is over replicated.  It unassigns shard B from node 
5 and assigns shard D to node 3.

- node 5 updates the cluster state to indicate it is no longer serving shard B

- node 3 downloads a copy of shard D and replays the log for D to catch up the 
archived index.

- node 3 registers that it is serving D

and so on.

The sequence of states for node 3 in this scenario is

- serving A,B,C
- down (not serving anything)
- up (not serving anything)
- serving A and B
- serving A, B and D

The client cannot derive this information accurately from simple liveness 
information.  Therefore the node has to update the cluster state.

                
      was (Author: tdunning):
    {We don't actually currently remove info about a shard when it goes away}
That is fine.

But there is a strong distinction between different information.  I see three 
kinds of information about a shard that live in different places and are 
intended for different audiences.

- the content and status of collections.  This is intended as control input for 
the overseer and contains information about which collections exist and what 
partitions they contain and other policy information such as requested 
replication level.  This is the information that should not be deleted

- the shard assignments for nodes.  This is generated by the overseer and 
intended for the nodes.  When a node goes down, the assignments for that node 
*should* be deleted and other nodes should get other assignments.

- the cluster state.  This is generated by the nodes and the overseer and read 
by the search clients to let them know who to send queries to.  This is the 
file that I was referring to.  I was suggesting that the clauses that state 
that a particular node is serving requests for a particular shard should be 
deleted by the overseer when the node disappears.  I can't imagine that there 
is any question about this because the clients have an urgent need to stop 
sending queries to downed nodes as soon as possible.  Deleting a clause from 
this state doesn't forget about the shard; it just gives an accurate picture of 
who is serving a shard.

I really think that separating the information like this makes it much simpler 
to keep track of what is going on, especially for the clients.  They should 
only need to look one place for what they need to know.  Besides, they can't 
really reconcile things correctly.  If shards A,B,C were all served by node 3, 
then when node 3 goes down, the client can get things right.  But if node 3 
comes back and reloads A but not yet B and C, the client cannot know what the 
current state is unless somebody updates the state correctly.

In my view, what should happen when node 3 goes down and then back up is this:

- the ephemeral for node 3 in live_nodes disappears

- the overseer removes the "node 3 has A, B, C" clause from the cluster state

- the overseer assigns shards A, B and C to other nodes to start the 
replication process

- node 6 registers that it is serving shard C

- node 5 registers that it is serving shard B

- node 3 comes back, creates the ephemeral in live_nodes and starts updating 
the indexes it has from the transaction log.

- node 3 registers that it has shards A and B up-to-date because there have 
been no changes.

- the overseer looks at the cluster state and decides that node 3 is 
under-utilized and shard B is over replicated.  It unassigns shard B from node 
5 and assigns shard D to node 3.

- node 5 updates the cluster state to indicate it is no longer serving shard B

- node 3 downloads a copy of shard D and replays the log for D to catch up the 
archived index.

- node 3 registers that it is serving D

and so on.

The sequence of states for node 3 in this scenario is

- serving A,B,C
- down (not serving anything)
- up (not serving anything)
- serving A and B
- serving A, B and D

The client cannot derive this information accurately from simple liveness 
information.  Therefore the node has to update the cluster state.

                  
> Shard/Node states
> -----------------
>
>                 Key: SOLR-2765
>                 URL: https://issues.apache.org/jira/browse/SOLR-2765
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud, update
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: combined.patch, incremental_update.patch, 
> scheduled_executors.patch, shard-roles.patch
>
>
> Need state for shards that indicate they are recovering, active/enabled, or 
> disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to