[
https://issues.apache.org/jira/browse/SOLR-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026695#comment-16026695
]
Tomás Fernández Löbbe commented on SOLR-10751:
----------------------------------------------
OK, I see now why this hasn't been a problem so far. Note that the "delete my
index" only happens in case of a "forced replication". Forced replications in
Master/Slave can only happen in a retry, which should not happen if the master
is returning version 0 (unless I'm misunderstanding something here, this code
should never be executed if you are running Master/Slave). In SolrCloud mode, a
forced replication can happen if the last attempt to replicate was
unsuccessful. Until now the replication in SolrCloud was only for recovery, and
Cloud mode it's "OK" to have different versions of the index, plus, in the
particular test example I described in the issue, the replication would have
been followed by the application of the buffered updates, so indices would be
soon in sync. This becomes an issue only now that we have TLOG and PULL
replicas.
In any case, we need to fix it now for the new scenario. I also like your #2
option (#1 sounds like too big of a change), and It should be easy to
implement, although NRT replicas still need this logic I believe.
> Master/Slave IndexVersion conflict
> ----------------------------------
>
> Key: SOLR-10751
> URL: https://issues.apache.org/jira/browse/SOLR-10751
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: master (7.0)
> Reporter: Tomás Fernández Löbbe
> Assignee: Tomás Fernández Löbbe
> Attachments: SOLR-10751.patch
>
>
> I’ve been looking at some failures in the replica types tests. One strange
> failure I noticed is, master and slave share the same version, but have
> different generation. The IndexFetcher code does more or less this:
> {code}
> masterVersion = fetchMasterVersion()
> masterGeneration = fetchMasterGeneration()
> if (masterVersion == 0 && slaveGeneration != 0 && forceReplication) {
> delete my index
> commit locally
> return
> }
> if (masterVersion != slaveVersion) {
> fetchIndexFromMaster(masterGeneration)
> } else {
> //do nothing, master and slave are in sync.
> }
> {code}
> The problem I see happens with this sequence of events:
> delete index in master (not a DBQ=*:*, I mean a complete removal of the index
> files and reload of the core)
> replication happens in slave (sees a version 0, deletes local index and
> commit)
> add document in master and commit
> if the commit in master and in the slave happen at the same millisecond*,
> they both end up with the same version, but different indices.
> I think that in addition of checking for the same version, we should validate
> that slave and master have the same generation and If not, consider them not
> in sync, and proceed to the replication.
> True, this is a situation that's difficult to happen in a real prod
> environment and it's more likely to affect tests, but I think the change
> makes sense.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]