[
https://issues.apache.org/jira/browse/SOLR-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153422#comment-15153422
]
Ishan Chattopadhyaya edited comment on SOLR-8687 at 2/19/16 12:23 AM:
----------------------------------------------------------------------
bq. it's also disconcerting if it's not air-tight and our tests don't catch it
With [~steve_rowe]'s help, I stress tested 1000 rounds (16 at a time) of the
StressReorderTest and it didn't fail.
However, the above mentioned test, which is similar to StressReorderTest, but
on a 3 node cluster instead of a simulated replica, failed around 10 times
(each with this exact failure). Also, I had increased the number of read
operations within each test from 50k to 200k. That means 10 failures out of 3.2
billion reads (which includes both searches and RTGs). At this time, I am
reasonably sure that the test had nothing to do with my other changes. Next up,
I shall isolate the test from the other changes and try to run it on a fresh
master so as to be sure I can reproduce.
was (Author: ichattopadhyaya):
bq. it's also disconcerting if it's not air-tight and our tests don't catch it
With [~steve_rowe]'s help, I stress tested 1000 rounds (16 at a time) of the
StressReorderTest and it didn't fail.
However, the above mentioned test, which is similar to StressReorderTest, but
on a 3 node cluster instead of a simulated replica, failed around 10 times
(each with this exact failure). Also, I had increased the number of read
operations within each test from 50k to 200k. At this time, I am reasonably
sure that the test had nothing to do with my other changes. Next up, I shall
isolate the test from the other changes and try to run it on a fresh master so
as to be sure I can reproduce.
> Race condition with RTGs during soft commit
> -------------------------------------------
>
> Key: SOLR-8687
> URL: https://issues.apache.org/jira/browse/SOLR-8687
> Project: Solr
> Issue Type: Bug
> Reporter: Ishan Chattopadhyaya
>
> I am facing a problem with stress testing SOLR-5944, even though I think this
> problem persists in Solr even without my changes.
> The symptom is that during a stress test (similar to TestStressReorder), RTG
> gets a document which is older version than that of the last acknowledged
> write.
> Possible reason:
> {code}
> (DUH2's commit())
> ...
> 1: if (cmd.softCommit) {
> 2: // ulog.preSoftCommit();
> 3: synchronized (solrCoreState.getUpdateLock()) {
> 4: if (ulog != null) ulog.preSoftCommit(cmd);
> 5: core.getSearcher(true, false, waitSearcher, true);
> 6: if (ulog != null) ulog.postSoftCommit(cmd);
> 7: }
> 8: callPostSoftCommitCallbacks();
> 9: }
> ...
> {code}
> * Before line 1, there was an update (say id=2) which was in ulog's map. Maps
> are, say, map=\{2=LogPtr(1234)\} , prevMap=\{...\} , prevMap2=\{...\}
> * Due to line 4 (ulog.preSoftCommit()), the maps were rotated. Now, the id=2
> is in prevMap: map={}, prevMap=\{2=LogPtr(1234)\}, prevMap2=\{...\} . Till
> now RTG for id=2 will work.
> * Due to line 5, a new searcher is due to be opened. But this is
> asynchronous, and lets assume this doesn't complete before few more lines are
> executed.
> * Due to line 6 (ulog.postSoftCommit()), the previous maps are cleared out.
> Now the maps are: map={}, prevMap=null, prevMap2=null
> * If there's an RTG for id=2, it will not work from the ulog's maps, so it
> will fall through to be searched using the last searcher. But, the searcher
> due to be opened in line 5 hasn't yet been opened. In this case, the returned
> document will be whatever version of id=2 that was present in the previous
> searcher.
> Can someone please confirm if this is a potential problem? If so, any
> suggestions for a fix, please? I tried opening a ulog.openRealtimeSearcher()
> in the above synchronized block, but the problem still persists, but I
> haven't looked into why that could be.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]