[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.

Mark Miller (JIRA) Thu, 17 Sep 2015 16:09:15 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804686#comment-14804686
 ]


Mark Miller commented on SOLR-8069:
-----------------------------------

I think the thought game comes down to:

We check if locally think we are the leader (which requires being connected to 
zk).

We get the current leader context.

We check if locally think we are the leader.

If all that passes, we assume we have context for when we were the leader. Now 
publishing only works if that same leader is registered.

So where are the holes?

There does not seem to be a lot of room to get the wrong context? In what 
scenario could we think we are the leader before and after the getContext call 
and end up with the wrong context?

 And if we have the leaders context, the multi update ensures the update only 
happens if that context is still the leader.

> Leader Initiated Recovery can put the replica with the latest data into LIR 
> and a shard will have no leader even on restart.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8069
>                 URL: https://issues.apache.org/jira/browse/SOLR-8069
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>         Attachments: SOLR-8069.patch, SOLR-8069.patch
>
>
> I've seen this twice now. Need to work on a test.
> When some issues hit all the replicas at once, you can end up in a situation 
> where the rightful leader was put or put itself into LIR. Even on restart, 
> this rightful leader won't take leadership and you have to manually clear the 
> LIR nodes.
> It seems that if all the replicas participate in election on startup, LIR 
> should just be cleared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.

Reply via email to