[ 
https://issues.apache.org/jira/browse/SOLR-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243069#comment-14243069
 ] 

Shalin Shekhar Mangar commented on SOLR-6837:
---------------------------------------------

My reply to the email:

{quote}
A write in Solr, by default, is only guaranteed to exist in 1 place i.e. the 
leader and the safety valves that we have to preserve these writes are:

1. The leaderVoteWait time for which leader election is suspended until enough 
live replicas are available
2. The two-way peer-sync between leader candidate and other replicas

The other safety valve is on the client side with the "min_rf" parameter 
introduced by SOLR-5468 in Solr 4.9. If you set this param to 2 while making 
the request then Solr will return the number of replicas to which it could 
successfully send the update. Then depending on the response you can make a 
decision to retry the update at a later time assuming it is idempotent. This 
kinda puts the onus ensuring consistency on the client side which is not ideal 
but better than nothing. See SOLR-5468 for more discussion on this topic.

In your particular example, none of these safeties are invoked because you 
start node2 while node1 was down and node2 goes ahead with leader election 
after the wait period. Also since node1 was down during leader election, peer 
sync doesn't happen and then node2 becomes the leader.

When node1 comes back online and joins as a replica, it recovers from the 
leader using peer-sync (which returns the newest 100 updates) and finds that 
there's nothing newer on the leader. However, there are no checks to make sure 
that the replica doesn't have a newer update itself which is why you end up 
with the inconsistent replica. If there were a lot of updates on node2 (more 
than 100) while node1 was down, in which case peer-sync isn't applicable, then 
it'd would have done a replication recovery and this inconsistency would have 
been resolved.

So yeah we have a valid consistency bug such that we have inconsistent replicas 
in a steady state. I wonder if the right way is to bump min_rf to a higher 
value or peer-sync both ways during replica recovery. I'll need to think more 
on this.
{quote}

> Inconsistent replicas when update is succesful against leader partitioned 
> from all replicas
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6837
>                 URL: https://issues.apache.org/jira/browse/SOLR-6837
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.10.2
>            Reporter: Shalin Shekhar Mangar
>              Labels: difficulty-hard, impact-medium
>
> Refer to the following question on solr-user:
> https://www.marshut.net/kttiuz/inconsistent-doc-value-across-two-nodes-very-simple-test-what-s-the-expected-behavior.html
> {quote}
> Config
> Solr 4.7.2 / Jetty. 
> SoldCloud on two nodes, and  3 ZK, all running in localhost. 
> single collection: single shard with two replicas.
> Reproducing:
> start node1 9.148.58.114:8983
> start node2 9.148.58.114:8984
> Cluster state: node1 leader. node2 active.
> index value 'A' (id="change me").
> query and expect 'A' -> success
> Stop node2
> Cluster state: node1 leader. node2 gone.
> query and expect 'A' -> success
> Update document value from 'A'->'B'
> query and expect 'B' -> success
> Stop node1
> then
> Start node2.
> Cluster state: node1 gone. node2 down.
>     104510 [coreZkRegister-1-thread-1] INFO  
> org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more 
> replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms
> wait 3m.
>     184679 [coreZkRegister-1-thread-1] INFO  
> org.apache.solr.cloud.ShardLeaderElectionContext  I am the new leader: 
> http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/ 
> shard1    
> Cluster state: node1 gone. node2 leader.
> query and expect 'A' (old value) -> success
> start node1
> Cluster state: node1 actove. node2 leader.
> Inconsistency: 
>     Querying node1 always returns 'B'. 
> http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true
>     Querying node1 always returns 'A'. 
> http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true
> {quote}
> In such a case, the final steady state of the system has inconsistent 
> replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to