[jira] [Commented] (SOLR-10751) Master/Slave IndexVersion conflict

JIRA Thu, 25 May 2017 14:15:30 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025410#comment-16025410
 ]


Tomás Fernández Löbbe commented on SOLR-10751:
----------------------------------------------

[~hossman] and I had a conversation about this on IRC yesterday, and his 
concern was "Why is master creating an index with version 0 and the slave is 
not". After investigating some more, I noticed this code in the 
{{ReplicationHandler}}
{code:java}
if (commitPoint != null && replicationEnabled.get()) {
        //
        // There is a race condition here.  The commit point may be changed / 
deleted by the time
        // we get around to reserving it.  This is a very small window though, 
and should not result
        // in a catastrophic failure, but will result in the client getting an 
empty file list for
        // the CMD_GET_FILE_LIST command.
        //
        
core.getDeletionPolicy().setReserveDuration(commitPoint.getGeneration(), 
reserveCommitDuration);
        rsp.add(CMD_INDEX_VERSION, 
IndexDeletionPolicyWrapper.getCommitTimestamp(commitPoint));
        rsp.add(GENERATION, commitPoint.getGeneration());
      } else {
        // This happens when replication is not configured to happen after 
startup and no commit/optimize
        // has happened yet.
        rsp.add(CMD_INDEX_VERSION, 0L);
        rsp.add(GENERATION, 0L);
      }
{code}
so, "0" is not really the version of the index, but it's that the master 
responds to the slaves when there is no replicable index. 

> Master/Slave IndexVersion conflict
> ----------------------------------
>
>                 Key: SOLR-10751
>                 URL: https://issues.apache.org/jira/browse/SOLR-10751
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: master (7.0)
>            Reporter: Tomás Fernández Löbbe
>            Assignee: Tomás Fernández Löbbe
>
> I’ve been looking at some failures in the replica types tests. One strange 
> failure I noticed is, master and slave share the same version, but have 
> different generation. The IndexFetcher code does more or less this:
> {code}
> masterVersion = fetchMasterVersion()
> masterGeneration = fetchMasterGeneration()
> if (masterVersion == 0 && slaveGeneration != 0 && forceReplication) {
>    delete my index
>    commit locally
>    return
> } 
> if (masterVersion != slaveVersion) {
>   fetchIndexFromMaster(masterGeneration)
> } else {
>   //do nothing, master and slave are in sync.
> }
> {code}
> The problem I see happens with this sequence of events:
> delete index in master (not a DBQ=*:*, I mean a complete removal of the index 
> files and reload of the core)
> replication happens in slave (sees a version 0, deletes local index and 
> commit)
> add document in master and commit
> if the commit in master and in the slave happen at the same millisecond*, 
> they both end up with the same version, but different indices. 
> I think that in addition of checking for the same version, we should validate 
> that slave and master have the same generation and If not, consider them not 
> in sync, and proceed to the replication.
> True, this is a situation that's difficult to happen in a real prod 
> environment and it's more likely to affect tests, but I think the change 
> makes sense. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10751) Master/Slave IndexVersion conflict

Reply via email to