[
https://issues.apache.org/jira/browse/SOLR-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003133#comment-15003133
]
Gregory Chanan commented on SOLR-8075:
--------------------------------------
Patch looks good, thanks Mark!
A comment outside the scope of this JIRA (I know this is pre-existing logic),
but which I can't find a better place for:
{code}
+ // We can do this before registering as leader because only setting
DOWN requires that
+ // we are leader, and here we are setting ACTIVE
+ zkController.updateLeaderInitiatedRecoveryState(collection, shardId,
+ leaderProps.getStr(ZkStateReader.CORE_NODE_NAME_PROP),
Replica.State.ACTIVE, core.getCoreDescriptor(), true);
{code}
This seems difficult to reason about given that there are multiple
non-commutative writers potentially racing here: a leader setting DOWN and this
node setting ACTIVE. It would be easier to reason about if there were two
states:
1) leaders view of the world
2) replicas view of the world (i.e. telling the Overseer I know the leader
thinks I'm in LIR but I know some special information and I'm telling you for
this election # I'm OK). That could go in the ZKNodeProps sent to the Overseer
(or a separate znode) and the Overseer could do the correct logic with it.
Anyway, outside of the scope of the jira, just wanted to jot my thoughts down.
if you think this is a valid improvement I can file a jira for it.
> Leader Initiated Recovery should not stop a leader that participated in an
> election with all of it's replicas from becoming a valid leader.
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-8075
> URL: https://issues.apache.org/jira/browse/SOLR-8075
> Project: Solr
> Issue Type: Bug
> Reporter: Mark Miller
> Assignee: Mark Miller
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-8075.patch, SOLR-8075.patch, SOLR-8075.patch,
> SOLR-8075.patch, SOLR-8075.patch, SOLR-8075.patch, SOLR-8075.patch
>
>
> Currently, because of SOLR-8069, all the replicas in a shard can be put into
> LIR.
> If you restart such a shard, the valid leader will will win the election and
> sync with the shard and then be blocked from registering as ACTIVE because it
> is in LIR.
> I think that is a little wonky because I don't think it even tries another
> candidate because the leader that cannot publish ACTIVE does not have it's
> election canceled.
> While SOLR-8069 should prevent this situation, we should add logic to allow a
> leader that can sync with it's full shard to become leader and publish ACTIVE
> regardless of LIR.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]