[
https://issues.apache.org/jira/browse/SOLR-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670489#comment-16670489
]
Hoss Man commented on SOLR-12946:
---------------------------------
Thanks kevin ... i was comparing your failure logs with logs from a success
expecting to see a delay in a commit or searcher open to explain how the
control collection might be out of sync with the distrib collection, but i
didn't find anything -- however the full log is more verbose about the
missmatch in the responses...
{noformat}
[junit4] 2> 88972 ERROR
(TEST-DistribCursorPagingTest.test-seed#[87B400F0180614A3]) [ ]
o.a.s.BaseDistributedSearchTestCase Mismatched responses:
[junit4] 2>
{responseHeader={zkConnected=true,status=0,QTime=31},response={numFound=8,start=0,docs=[SolrDocument{id=7},
SolrDocument{id=0},
SolrDocument{id=3}]},nextCursorMark=AoIGAAAAACEz,facet_counts={facet_queries={},facet_fields={str={a=4,c=3,b=1,x=0,z=0}},facet_ranges={},facet_intervals={},facet_heatmaps={}}}
[junit4] 2>
{responseHeader={zkConnected=true,status=0,QTime=10},response={numFound=8,start=0,docs=[SolrDocument{id=7},
SolrDocument{id=0},
SolrDocument{id=3}]},nextCursorMark=AoIGAAAAACEz,facet_counts={facet_queries={},facet_fields={str={a=4,c=3,b=1}},facet_ranges={},facet_intervals={},facet_heatmaps={}}}
{noformat}
...which helped me realize what i hadn't noticed before: the mismatched facet
values from the failure are specifically for "x" (null != 0) although it would
clearly fail for "z" as well once it got to it .. but those values aren't even
expected at that part of the test -- all docs with those values have been
deleted.
----
i'd bet money that where the discrepency/failure is coming from is this...
* the docs with values 'x' and 'z' get deleted
* variability exists in what happens in the background before the next
commit/newSearcher...
** on a "fast" machine (w/o thread contention), one of 2 things happens
depending on the randomized merge policy in effect for the seed:
*** "1" a background merge happens on both collections
*** "2" a background merge does _not_ happen on either collection
** on a "slow" machine (w/thread contention) #1 and #2 may have happened
inconssintently and we get...
*** "3" a background merge happens on the control collection, but _not_ on a
replica o the distrib collection
* both collections get the {{commit}} command from the test thread and open a
new searcher
* the facet request is issued
** in case #1, or #2, the results are consistent between the control collection
and the distrib collection -- the test does not itself verify all the facet
buckets returned
** in case #3 the test framework detects a mismatch between the responses,
because one collection still sees that the "x" and "z" terms exist in the index
-- they haven't been merged away -- and returns "0" counts for them.
----
I'm not sure if there is a general purpose lesson/fix hat can be made here
regarding the way background merges can happen in tests and what kinds of
discrepencies in behavior it can cause, but for this particular test it can be
tightened up by using `facet.mincount=1` since that doesn't undermine the
intent of the test.
I'll look into that and try to commit asap (but first i want to see if i can
semi-reliably reproduce locally while hammering my machine with load)
> DistribCursorPagingTest non-reproducible failures in comparing facet counts
> ---------------------------------------------------------------------------
>
> Key: SOLR-12946
> URL: https://issues.apache.org/jira/browse/SOLR-12946
> Project: Solr
> Issue Type: Test
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Assignee: Hoss Man
> Priority: Major
> Attachments: risdenk-nuc-20181031-build-11.txt.gz,
> risdenk-nuc-20181031-build-31.txt.gz
>
>
> Anecdotal reports of failures from DistribCursorPagingTest long the lines of..
> {noformat}
> reproduce with: ant test -Dtestcase=DistribCursorPagingTest
> -Dtests.method=test -Dtests.seed=87B400F0180614A3 -Dtests.slow=true
> -Dtests.badapples=true -Dtests.locale=sq-AL -Dtests.timezone=Etc/GMT+9
> -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> 23:47:46 [junit4] FAILURE 19.4s J7 | DistribCursorPagingTest.test <<<
> 23:47:46 [junit4] > Throwable #1: junit.framework.AssertionFailedError:
> .facet_counts.facet_fields.str.x:0!=null
> 23:47:46 [junit4] > at
> __randomizedtesting.SeedInfo.seed([87B400F0180614A3:FE03F2AB6FA795B]:0)
> 23:47:46 [junit4] > at junit.framework.Assert.fail(Assert.java:50)
> 23:47:46 [junit4] > at
> org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:985)
> 23:47:46 [junit4] > at
> org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:1012)
> 23:47:46 [junit4] > at
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:666)
> 23:47:46 [junit4] > at
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:629)
> 23:47:46 [junit4] > at
> org.apache.solr.cloud.DistribCursorPagingTest.doSimpleTest(DistribCursorPagingTest.java:258)
> 23:47:46 [junit4] > at
> org.apache.solr.cloud.DistribCursorPagingTest.test(DistribCursorPagingTest.java:90)
> 23:47:46 [junit4] > at
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1066)
> 23:47:46 [junit4] > at
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1040)
> 23:47:46 [junit4] > at
> java.lang.Thread.run(Thread.java:748){noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]