[ 
https://issues.apache.org/jira/browse/SOLR-12946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670489#comment-16670489
 ] 

Hoss Man commented on SOLR-12946:
---------------------------------

Thanks kevin ... i was comparing your failure logs with logs from a success 
expecting to see a delay in a commit or searcher open to explain how the 
control collection might be out of sync with the distrib collection, but i 
didn't find anything -- however the full log is more verbose about the 
missmatch in the responses...

{noformat}
   [junit4]   2> 88972 ERROR 
(TEST-DistribCursorPagingTest.test-seed#[87B400F0180614A3]) [    ] 
o.a.s.BaseDistributedSearchTestCase Mismatched responses:
   [junit4]   2> 
{responseHeader={zkConnected=true,status=0,QTime=31},response={numFound=8,start=0,docs=[SolrDocument{id=7},
 SolrDocument{id=0}, 
SolrDocument{id=3}]},nextCursorMark=AoIGAAAAACEz,facet_counts={facet_queries={},facet_fields={str={a=4,c=3,b=1,x=0,z=0}},facet_ranges={},facet_intervals={},facet_heatmaps={}}}
   [junit4]   2> 
{responseHeader={zkConnected=true,status=0,QTime=10},response={numFound=8,start=0,docs=[SolrDocument{id=7},
 SolrDocument{id=0}, 
SolrDocument{id=3}]},nextCursorMark=AoIGAAAAACEz,facet_counts={facet_queries={},facet_fields={str={a=4,c=3,b=1}},facet_ranges={},facet_intervals={},facet_heatmaps={}}}
{noformat}

...which helped me realize what i hadn't noticed before: the mismatched facet 
values from the failure are specifically for "x" (null != 0) although it would 
clearly fail for "z" as well once it got to it .. but those values aren't even 
expected at that part of the test -- all docs with those values have been 
deleted.

----

i'd bet money that where the discrepency/failure is coming from is this...

* the docs with values 'x' and 'z' get deleted
* variability exists in what happens in the background before the next 
commit/newSearcher...
** on a "fast" machine (w/o thread contention), one of 2 things happens 
depending on the randomized merge policy in effect for the seed:
*** "1" a background merge happens on both collections
*** "2" a background merge does _not_ happen on either collection
** on a "slow" machine (w/thread contention) #1 and #2 may have happened 
inconssintently and we get...
*** "3" a background merge happens on the control collection, but _not_ on a 
replica o the distrib collection
* both collections get the {{commit}} command from the test thread and open a 
new searcher
* the facet request is issued
** in case #1, or #2, the results are consistent between the control collection 
and the distrib collection -- the test does not itself verify all the facet 
buckets returned
** in case #3 the test framework detects a mismatch between the responses, 
because one collection still sees that the "x" and "z" terms exist in the index 
-- they haven't been merged away -- and returns "0" counts for them.

----

I'm not sure if there is a general purpose lesson/fix hat can be made here 
regarding the way background merges can happen in tests and what kinds of 
discrepencies in behavior it can cause, but for this particular test it can be 
tightened up by using `facet.mincount=1` since that doesn't undermine the 
intent of the test.

I'll look into that and try to commit asap (but first i want to see if i can 
semi-reliably reproduce locally while hammering my machine with load)

> DistribCursorPagingTest non-reproducible failures in comparing facet counts
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-12946
>                 URL: https://issues.apache.org/jira/browse/SOLR-12946
>             Project: Solr
>          Issue Type: Test
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Major
>         Attachments: risdenk-nuc-20181031-build-11.txt.gz, 
> risdenk-nuc-20181031-build-31.txt.gz
>
>
> Anecdotal reports of failures from DistribCursorPagingTest long the lines of..
> {noformat}
> reproduce with: ant test  -Dtestcase=DistribCursorPagingTest 
> -Dtests.method=test -Dtests.seed=87B400F0180614A3 -Dtests.slow=true 
> -Dtests.badapples=true -Dtests.locale=sq-AL -Dtests.timezone=Etc/GMT+9 
> -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> 23:47:46    [junit4] FAILURE 19.4s J7 | DistribCursorPagingTest.test <<<
> 23:47:46    [junit4]    > Throwable #1: junit.framework.AssertionFailedError: 
> .facet_counts.facet_fields.str.x:0!=null
> 23:47:46    [junit4]    >     at 
> __randomizedtesting.SeedInfo.seed([87B400F0180614A3:FE03F2AB6FA795B]:0)
> 23:47:46    [junit4]    >     at junit.framework.Assert.fail(Assert.java:50)
> 23:47:46    [junit4]    >     at 
> org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:985)
> 23:47:46    [junit4]    >     at 
> org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:1012)
> 23:47:46    [junit4]    >     at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:666)
> 23:47:46    [junit4]    >     at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:629)
> 23:47:46    [junit4]    >     at 
> org.apache.solr.cloud.DistribCursorPagingTest.doSimpleTest(DistribCursorPagingTest.java:258)
> 23:47:46    [junit4]    >     at 
> org.apache.solr.cloud.DistribCursorPagingTest.test(DistribCursorPagingTest.java:90)
> 23:47:46    [junit4]    >     at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1066)
> 23:47:46    [junit4]    >     at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1040)
> 23:47:46    [junit4]    >     at 
> java.lang.Thread.run(Thread.java:748){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to