[jira] [Comment Edited] (SOLR-8619) A new replica should not become leader when all current replicas are down as it leads to data loss

Jason Gerlowski (JIRA) Thu, 28 Jan 2016 18:23:06 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122781#comment-15122781
 ]


Jason Gerlowski edited comment on SOLR-8619 at 1/29/16 2:22 AM:
----------------------------------------------------------------

Throwing in my 2 cents.  New to SolrCloud, so feel free to ignore...

+1 for having a check to ensure that a replica isn't marked as a leader unless 
it's had a chance to sync with a leader.

+1 for having ADDREPLICA calls fail if there are no active replicas.  I'd be 
fine with allowing API users to create not-ready-for-leadership replicas if 
there was a great way of conveying that caveat to them.  But short of adding a 
replica-state option to CLUSTERSTATUS to convey this caveat, I can't think of a 
good way to do this.  IMO, it seems cleaner conceptually to prevent users up 
front from getting into this state.  Bit hand-wavy though, so take this 
rationale with a grain of salt.


was (Author: gerlowskija):
Throwing in my 2 cents.  New to SolrCloud, so feel free to ignore...

+1 for having a check to ensure that a replica isn't marked as a leader unless 
it's had a chance to sync with a leader.

+1 for having ADDREPLICA calls fail if there are no active replicas.  I'd be 
fine with allowing API users to create not-ready-for-leadership replicas if 
there was a great way of conveying that caveat to them.  But short of adding a 
replica-state option to CLUSTERSTATUS, I can't think of a good way to do this.  
IMO, it seems cleaner conceptually to prevent users up front from getting into 
this state.  Bit hand-wavy though, so take this rationale with a grain of salt.

> A new replica should not become leader when all current replicas are down as 
> it leads to data loss
> --------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8619
>                 URL: https://issues.apache.org/jira/browse/SOLR-8619
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Anshum Gupta
>
> Here's what I'm talking about:
> * Start a 2 node solrcloud cluster
> * Create a 1 shard/1 replica collection
> * Add documents
> * Shut down the node that has the only active shard
> * ADDREPLICA for the shard/collection, so Solr would attempt to add a new 
> replica on the other node
> * Solr waits for a while before this replica becomes an active leader.
> * Index a few new docs
> * Bring up the old node
> * The replica comes up, with it's old index and then syncs to only contain 
> the docs from the new leader.
> All old documents are lost in this case
> Here are a few things that might work here:
> 1. Reject an ADDREPLICA call if all current replicas for the shard are down. 
> Considering the new replica can not sync from anyone, it doesn't make sense 
> for this replica to even come up
> 2. The replica shouldn't become active/leader unless either it was the last 
> known leader or active before it went into recovering state
> unless there are no other replicas in the clusterstate.
> This might very well be related to SOLR-8173 but we should add a check to 
> ADDREPLICA as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-8619) A new replica should not become leader when all current replicas are down as it leads to data loss

Reply via email to