[
https://issues.apache.org/jira/browse/SOLR-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025410#comment-16025410
]
Tomás Fernández Löbbe commented on SOLR-10751:
----------------------------------------------
[~hossman] and I had a conversation about this on IRC yesterday, and his
concern was "Why is master creating an index with version 0 and the slave is
not". After investigating some more, I noticed this code in the
{{ReplicationHandler}}
{code:java}
if (commitPoint != null && replicationEnabled.get()) {
//
// There is a race condition here. The commit point may be changed /
deleted by the time
// we get around to reserving it. This is a very small window though,
and should not result
// in a catastrophic failure, but will result in the client getting an
empty file list for
// the CMD_GET_FILE_LIST command.
//
core.getDeletionPolicy().setReserveDuration(commitPoint.getGeneration(),
reserveCommitDuration);
rsp.add(CMD_INDEX_VERSION,
IndexDeletionPolicyWrapper.getCommitTimestamp(commitPoint));
rsp.add(GENERATION, commitPoint.getGeneration());
} else {
// This happens when replication is not configured to happen after
startup and no commit/optimize
// has happened yet.
rsp.add(CMD_INDEX_VERSION, 0L);
rsp.add(GENERATION, 0L);
}
{code}
so, "0" is not really the version of the index, but it's that the master
responds to the slaves when there is no replicable index.
> Master/Slave IndexVersion conflict
> ----------------------------------
>
> Key: SOLR-10751
> URL: https://issues.apache.org/jira/browse/SOLR-10751
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: master (7.0)
> Reporter: Tomás Fernández Löbbe
> Assignee: Tomás Fernández Löbbe
>
> I’ve been looking at some failures in the replica types tests. One strange
> failure I noticed is, master and slave share the same version, but have
> different generation. The IndexFetcher code does more or less this:
> {code}
> masterVersion = fetchMasterVersion()
> masterGeneration = fetchMasterGeneration()
> if (masterVersion == 0 && slaveGeneration != 0 && forceReplication) {
> delete my index
> commit locally
> return
> }
> if (masterVersion != slaveVersion) {
> fetchIndexFromMaster(masterGeneration)
> } else {
> //do nothing, master and slave are in sync.
> }
> {code}
> The problem I see happens with this sequence of events:
> delete index in master (not a DBQ=*:*, I mean a complete removal of the index
> files and reload of the core)
> replication happens in slave (sees a version 0, deletes local index and
> commit)
> add document in master and commit
> if the commit in master and in the slave happen at the same millisecond*,
> they both end up with the same version, but different indices.
> I think that in addition of checking for the same version, we should validate
> that slave and master have the same generation and If not, consider them not
> in sync, and proceed to the replication.
> True, this is a situation that's difficult to happen in a real prod
> environment and it's more likely to affect tests, but I think the change
> makes sense.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]