Shalin Shekhar Mangar created SOLR-7427:
-------------------------------------------
Summary: Recovery can miss some updates when they're neither
forwarded nor present in replicated index
Key: SOLR-7427
URL: https://issues.apache.org/jira/browse/SOLR-7427
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 5.1, 4.10.4
Reporter: Shalin Shekhar Mangar
According to discussion in SOLR-7141. See [[email protected]]'s comment at
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501622#comment-14501622
{quote}
>From memory, here's how it's supposed to work:
1. replica tells leader it want's to recover
2. leader starts forwarding updates to replica (which the replica buffers since
it's in recovery)
3. leader executes a hard commit (so replica can replicate the current index)
4. replica starts replicating index from the last leader commit point
Note that the ordering of #2 and #3 is very important. If we did #3 first and
then #2 after, some updates won't make it into the commit and also won't be
forwarded to the replica (and that leads to data loss).
Now the issue: even though we do #2 first and #3 after... it's possible to have
an unfortunately scheduled update in a different thread that started before we
did #2, and doesn't complete until after #3, so that update was not forwarded,
and it's also not in the replicated index. The sleep (which should be between
steps #2 and #3) is to try and give time for this update to complete and make
it into the index.
It occurs to me that the lucene IndexWriter thread stealing (same issue that
caused this: SOLR-6820) could make this much more likely than we would have
thought.
One possible alternative is to block updates for a commit of this type
(replication commit). Any blocked updates would need to see that they need to
be forwarded to the replica too (once they are unblocked) - I don't know if the
code is currently written that way.
{quote}
So there is some protection against such a situation but it is based on two
timeout values:
# The replica stalls recovery until the leader acknowledges that it has indeed
seen the replica in 'recovery' (via the prep recovery core admin API)
# The replica sleeps for 7 seconds by default (configured via the hidden-switch
"solr.cloud.wait-for-updates-with-stale-state-pause" system property) after
prep recovery completes to give additional time for such updates to complete.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]