[ 
https://issues.apache.org/jira/browse/SOLR-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774996#comment-16774996
 ] 

Cao Manh Dat commented on SOLR-10751:
-------------------------------------

Hi [~tomasflobbe], here are some of my analysis at this case (when we go along 
with #2 for TLOG replica)
Case 1: The wipe out is done by a DBQ, then it will present replica's tlog, in 
any cases latter, the leader continue serving or get down, we are guaranteed 
that. The replica will have enough data to continue. I think #2 is great 
solution in this case, we can avoid of cases where both leader and replica 
index is empty, but they have different things in tlog. The only downside here 
is {{DBQ *:*}} will makes tlog replicas out-of-sync with the leader until the 
next commit happen in leader, this change in behaviour should be noted to users.

Case 2: The wipe out is done without a DBQ and leader is healthy until the next 
commit. We still fine here since commit version is generated incremental, so 
only updates after the next commit are copied over.

Case 3: The wipe out is done without a DBQ and leader get down before finish 
the next commit. The index of the shard is unpredictable now. 

> Master/Slave IndexVersion conflict
> ----------------------------------
>
>                 Key: SOLR-10751
>                 URL: https://issues.apache.org/jira/browse/SOLR-10751
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.0
>            Reporter: Tomás Fernández Löbbe
>            Assignee: Tomás Fernández Löbbe
>            Priority: Major
>         Attachments: SOLR-10751.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I’ve been looking at some failures in the replica types tests. One strange 
> failure I noticed is, master and slave share the same version, but have 
> different generation. The IndexFetcher code does more or less this:
> {code}
> masterVersion = fetchMasterVersion()
> masterGeneration = fetchMasterGeneration()
> if (masterVersion == 0 && slaveGeneration != 0 && forceReplication) {
>    delete my index
>    commit locally
>    return
> } 
> if (masterVersion != slaveVersion) {
>   fetchIndexFromMaster(masterGeneration)
> } else {
>   //do nothing, master and slave are in sync.
> }
> {code}
> The problem I see happens with this sequence of events:
> delete index in master (not a DBQ=\*:\*, I mean a complete removal of the 
> index files and reload of the core)
> replication happens in slave (sees a version 0, deletes local index and 
> commit)
> add document in master and commit
> if the commit in master and in the slave happen at the same millisecond*, 
> they both end up with the same version, but different indices. 
> I think that in addition of checking for the same version, we should validate 
> that slave and master have the same generation and If not, consider them not 
> in sync, and proceed to the replication.
> True, this is a situation that's difficult to happen in a real prod 
> environment and it's more likely to affect tests, but I think the change 
> makes sense. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to