[ 
https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237904#comment-14237904
 ] 

Erick Erickson commented on SOLR-6691:
--------------------------------------

[~noble.paul] Here's my promised note.

The code for figuring out who's "the guy in front" seems like it has a problem 
in my case. Pining here since every other time I've had problems here it's been 
a self-inflicted wound....

But this time I _swear_ I have some evidence....

Since the sorting is sensitive to session ID when two nodes have the same 
sequence ID, their order "depends". Note that since some of my tests are on a 
single Solr instance just rearranging shard leadership, I can have identical 
sessions, but the principle is the same for overseer. 

So let's say core_node2 joins at head. Depending on the session it may sort 
before or after the previous node with sequence 000001. This may not ever 
really be a problem with the Overseer election though, can a node rejoin at 
head without _also_ having a new session ID that's greater than any other ones 
in the election queue? Because if that's so, then the node rejoining will 
_always_ sort after the other node with the same sequence ID and this case will 
not occur. But for shard election on a single node hosting, say, 6 replicas it 
definitely happens.

Anyway, if core_node2 rejoins at head, it can look like either of these:

session1-core1-n_0000000
session2-core2-n_0000001
session3-core3-n_0000001
session4-core4-n_0000002

or

session1-core1-n_0000000
session12-core3-n_0000001
session3-core2-n_0000001
session4-core4-n_0000002

The problem here is that the LeaderElector code finds the index of the node 
_after_ the current sequence number then backs up two. So if core2 is looking 
for the "guy in front" in the first case, it'll watch itself. In the second 
case it'll watch core3 as it should.

I've got what I think is a solution, but I have to beat it to death for a while 
first. Looking for whether this is a sound analysis at this point.

> REBALANCELEADERS needs to change the leader election queue.
> -----------------------------------------------------------
>
>                 Key: SOLR-6691
>                 URL: https://issues.apache.org/jira/browse/SOLR-6691
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>
> The original code (SOLR-6517) assumed that changes in the clusterstate after 
> issuing a command to the overseer to change the leader indicated that the 
> leader was successfully changed. Fortunately, Noble clued me in that this 
> isn't the case and that the potential leader needs to insert itself in the 
> leader election queue before trigging the change leader command.
> Inserting themselves in the front of the queue should probably happen in 
> BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.
> [~noble.paul] Do evil things happen if a node joins at the head but it's 
> _already_ in the queue? These ephemeral nodes in the queue are watching each 
> other. So if node1 is the leader you have
> node1 <- node2 <- node3 <- node4
> where <- means "watches".
> Now, if node3 puts itself at the head of the list, you have
> {code}
> node1 <- node2
>       <- node3 <- node4
> {code}
> I _think_ when I was looking at this it all "just worked". 
> 1> node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure 
> that node3 becomes the leader and node2 inserts itself at then end so it's 
> watching node 4.
> 2> node 2 goes down, nobody gets notified and it doesn't matter.
> 3> node 3 goes down, node 4 gets notified and starts watching node 2 by 
> inserting itself at the end of the list.
> 4> node 4 goes down, nobody gets notified and it doesn't matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to