[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804686#comment-14804686 ]
Mark Miller commented on SOLR-8069: ----------------------------------- I think the thought game comes down to: We check if locally think we are the leader (which requires being connected to zk). We get the current leader context. We check if locally think we are the leader. If all that passes, we assume we have context for when we were the leader. Now publishing only works if that same leader is registered. So where are the holes? There does not seem to be a lot of room to get the wrong context? In what scenario could we think we are the leader before and after the getContext call and end up with the wrong context? And if we have the leaders context, the multi update ensures the update only happens if that context is still the leader. > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > ---------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug > Reporter: Mark Miller > Attachments: SOLR-8069.patch, SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org