ASF subversion and git services commented on SOLR-9446:

Commit 15cee3141c160c38756ceed73bd1cd88002c01c9 in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=15cee31 ]

SOLR-9446: Leader failure after creating a freshly replicated index can send 
nodes into recovery even if index was not changed

> Leader failure after creating a freshly replicated index can send nodes into 
> recovery even if index was not changed
> -------------------------------------------------------------------------------------------------------------------
>                 Key: SOLR-9446
>                 URL: https://issues.apache.org/jira/browse/SOLR-9446
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: replication (java)
>            Reporter: Pushkar Raste
>            Assignee: Noble Paul
>            Priority: Minor
>         Attachments: SOLR-9446.patch
>  We noticed this issue while migrating solr index from machines {{A1, A2 and 
> A3}} to {{B1, B2, B3}}. We followed following steps (and there were no 
> updates during the migration process).
> * Index had replicas on machines {{A1, A2, A3}}. Let's say {{A1}} was the 
> leader at the time
> * We added 3 more replicas {{B1, B2 and B3}}. These nodes synced with the by 
> replication. These fresh nodes do not have tlogs.
> * We shut down one of the old nodes ({{A3}}). 
> * We then shut down the leader ({{A1}})
> * New leader got elected (let's say {{A2}}) became the new leader
> * Leader asked all the replicas to sync with it
> * Fresh nodes (ones without tlogs), first tried PeerSync but since there was 
> no frame of reference, PeerSync failed and fresh nodes fail back on to try 
> replication 
> Although replication would not copy all the segments again, it seems like we 
> can short circuit sync to put nodes back in active state as soon as possible. 
> If in case freshly replicated index becomes leader for some reason, it can 
> still send nodes (both other freshly replicated indexes and old replicas) 
> into recovery. Here is the scenario
> * Freshly replicated becomes the leader.
> * New leader however asks all the replicas to sync with it.
> * Replicas (including old one) ask for versions from the leader, but the 
> leader has no update logs, hence replicas can not compute missing versions 
> and falls back to replication

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to