[
https://issues.apache.org/jira/browse/SOLR-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774781#comment-16774781
]
Tomás Fernández Löbbe commented on SOLR-10751:
----------------------------------------------
I created a PR with #2, still WIP. In the PR, I only handle the version 0 case
differently for PULL replicas, however, [~caomanhdat] did something related for
TLOG replicas. For the TLOG, there is no commit, however, the replica opens a
new searcher and updates the commit point in the {{IndexFetcher}}. I'm guessing
this is so that the TLOG replicas show 0 results for the search, and also if it
becomes the leader, the followers will replicate the empty index from the
leader. I'm wondering if for TLOG replicas we would want the same behavior than
PULLs actually, and no replication happening in the case of the version 0?
[~caomanhdat], [~shalinmangar], your input would be great.
As for testing, both {{TestPullReplica}} and {{TestTlogReplica}} are disabled
with {{@AwaitsFix}} at this point. I enabled {{TestPullReplica}} and It's in
good shape. {{TestTlogReplica}} did have many failures, I'm going to take a
look at. {{ChaosMonkeyNothingIsSafeWithPullReplicasTest}} is also looking
better (1 failure after 1k runs, and it's an object leak that seems related to
this {{openNewSearcherAndUpdateCommitPoint}} code actually)
> Master/Slave IndexVersion conflict
> ----------------------------------
>
> Key: SOLR-10751
> URL: https://issues.apache.org/jira/browse/SOLR-10751
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 7.0
> Reporter: Tomás Fernández Löbbe
> Assignee: Tomás Fernández Löbbe
> Priority: Major
> Attachments: SOLR-10751.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I’ve been looking at some failures in the replica types tests. One strange
> failure I noticed is, master and slave share the same version, but have
> different generation. The IndexFetcher code does more or less this:
> {code}
> masterVersion = fetchMasterVersion()
> masterGeneration = fetchMasterGeneration()
> if (masterVersion == 0 && slaveGeneration != 0 && forceReplication) {
> delete my index
> commit locally
> return
> }
> if (masterVersion != slaveVersion) {
> fetchIndexFromMaster(masterGeneration)
> } else {
> //do nothing, master and slave are in sync.
> }
> {code}
> The problem I see happens with this sequence of events:
> delete index in master (not a DBQ=\*:\*, I mean a complete removal of the
> index files and reload of the core)
> replication happens in slave (sees a version 0, deletes local index and
> commit)
> add document in master and commit
> if the commit in master and in the slave happen at the same millisecond*,
> they both end up with the same version, but different indices.
> I think that in addition of checking for the same version, we should validate
> that slave and master have the same generation and If not, consider them not
> in sync, and proceed to the replication.
> True, this is a situation that's difficult to happen in a real prod
> environment and it's more likely to affect tests, but I think the change
> makes sense.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]