[ 
https://issues.apache.org/jira/browse/SOLR-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907644#comment-16907644
 ] 

Hoss Man commented on SOLR-13696:
---------------------------------

Gus: can you please take a look at this?

based on my assessment, here's the crucial bits of the log..
{noformat}
hossman@tray:~/tmp/jenkins/DimensionalRoutedAliasUpdateProcessorTest$ grep 
testTimeCat__TRA__2019-07-05__CRA__calico 
thetaphi_Lucene-Solr-8.x-MacOSX_272.log.txt | egrep '(Opening 
\[Searcher|add=\[21|fq=cat_s:calico|\{\!terms\+f%3Did}21,20.*hits=2)'
   [junit4]   2> 4476175 INFO  (qtp759508539-75005) [n:127.0.0.1:55915_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard1 r:core_node5 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n2 ] 
o.a.s.s.SolrIndexSearcher Opening 
[Searcher@9bc49f7[testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n2] 
main]
   [junit4]   2> 4476176 INFO  (qtp1536738594-75022) [n:127.0.0.1:55916_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard1 r:core_node3 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1 ] 
o.a.s.s.SolrIndexSearcher Opening 
[Searcher@5547583d[testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1] 
main]
   [junit4]   2> 4476186 INFO  (qtp1998715126-75500) [n:127.0.0.1:55917_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node8 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7 ] 
o.a.s.s.SolrIndexSearcher Opening 
[Searcher@3d40f0e1[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7] 
main]
   [junit4]   2> 4476195 INFO  (qtp927691752-75020) [n:127.0.0.1:55918_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node6 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4 ] 
o.a.s.s.SolrIndexSearcher Opening 
[Searcher@18c82bc1[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4] 
main]
   [junit4]   2> 4477375 INFO  (qtp1998715126-75016) [n:127.0.0.1:55917_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node8 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7 ] 
o.a.s.u.p.LogUpdateProcessorFactory 
[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7]  webapp=/solr 
path=/update 
params={update.distrib=FROMLEADER&distrib.from=http://127.0.0.1:55918/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4/&wt=javabin&version=2}{add=[21
 (1641811095092985856)]} 0 2
   [junit4]   2> 4477960 INFO  (qtp927691752-75506) [n:127.0.0.1:55918_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node6 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4 ] 
o.a.s.u.p.LogUpdateProcessorFactory 
[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4]  webapp=/solr 
path=/update 
params={update.distrib=NONE&df=_text_&alias.update.distrib=TOLEADER&distrib.from=http://127.0.0.1:55918/solr/testTimeCat__TRA__2019-07-02__CRA__calico_shard2_replica_n6/&wt=javabin&version=2&processor=inc}{add=[21
 (1641811095092985856)]} 0 590
   [junit4]   2> 4477962 INFO  (commitScheduler-24384-thread-1) [     ] 
o.a.s.s.SolrIndexSearcher Opening 
[Searcher@745b6c94[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4] 
main]
   [junit4]   2> 4478213 INFO  (qtp1998715126-75501) [n:127.0.0.1:55917_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node8 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7 ] 
o.a.s.c.S.Request [testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7] 
 webapp=/solr path=/select 
params={q={!terms+f%3Did}21,20&rows=0&wt=javabin&version=2} hits=2 status=0 
QTime=13
   [junit4]   2> 4478408 INFO  (qtp1998715126-75016) [n:127.0.0.1:55917_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard2 r:core_node8 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7 ] 
o.a.s.c.S.Request [testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7] 
 webapp=/solr path=/select 
params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=516&start=0&fsv=true&fq=cat_s:calico&shard.url=http://127.0.0.1:55917/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7/|http://127.0.0.1:55918/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n4/&rows=0&version=2&q=*:*&stats=true&omitHeader=false&NOW=1565753074817&isShard=true&wt=javabin&stats.field=timestamp_dt}
 hits=0 status=0 QTime=0
   [junit4]   2> 4478408 INFO  (qtp1536738594-75032) [n:127.0.0.1:55916_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard1 r:core_node3 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1 ] 
o.a.s.c.S.Request [testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1] 
 webapp=/solr path=/select 
params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=516&start=0&fsv=true&fq=cat_s:calico&shard.url=http://127.0.0.1:55916/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1/|http://127.0.0.1:55915/solr/testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n2/&rows=0&version=2&q=*:*&stats=true&omitHeader=false&NOW=1565753074817&isShard=true&wt=javabin&stats.field=timestamp_dt}
 hits=0 status=0 QTime=0
   [junit4]   2> 4478408 INFO  (qtp1536738594-75031) [n:127.0.0.1:55916_solr 
c:testTimeCat__TRA__2019-07-05__CRA__calico s:shard1 r:core_node3 
x:testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1 ] 
o.a.s.c.S.Request [testTimeCat__TRA__2019-07-05__CRA__calico_shard1_replica_n1] 
 webapp=/solr path=/select 
params={q=*:*&stats=true&fq=cat_s:calico&rows=0&wt=javabin&version=2&stats.field=timestamp_dt}
 hits=0 status=0 QTime=1
   [junit4]   2> 4478461 INFO  (commitScheduler-24386-thread-1) [     ] 
o.a.s.s.SolrIndexSearcher Opening 
[Searcher@4ab6f7d7[testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7] 
main]
{noformat}
Summary:
 * doc 21 is added to {{testTimeCat__TRA__2019-07-05__CRA__calico}}
 * the complicated "check that it all got committed eventually" logic in 
{{addDocsAndCommit()}} finds the 21 (and 20) on one replica
 ** but meanwhile 
{{testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7}} hasn't finished 
opening the searcher
 * the test continues and eventually gets to the 
{{"stats=true&fq=cat_s:calico"}} request
 ** {{testTimeCat__TRA__2019-07-05__CRA__calico_shard2_replica_n7}} still 
hasn't finished opening the searcher
 ** test fails to find this doc

I think either this test (helper method) needs to stop using "commitWithin" or 
(if variations in routed alias code paths / behavior exist when commitWithin is 
used and we definitely want to test that) then the "got committed eventually" 
logic needs to be more robust and check *every* core (ie: loop over all the 
replicas of all the shards until you find the doc on at least one, then loop 
over all the other replicas of that same shard until you see that doc on all of 
them)

(this problem doesn't seem specific to 
DimensionalRoutedAliasUpdateProcessorTest , RoutedAliasUpdateProcessorTest and 
any other subcalsses using {{addDocsAndCommit()}} are also going to be affected)
----
Also: this test (and i presume others like it esting the routed aliases) would 
be a lot more robust and easier to debug if:
 * it had more logging about what it was doing to use as "trace points" when 
reading the logs
 * it made simpler/tighter assertions earlier in the flow ie:
 ** do the "// verify that all the documents ended up in the right 
collections." frequently
 *** idealy at the begining of {{assertCatTimeInvariants()}} call
 *** use a hardcoded number for the total docs expected (ie: don't rely on 
{{lastDocId - numDocsDeletedOrFailed}} ) so that people reading/debugging the 
test can sanity check exactly how many docs are expected at every point
 **** the {{lastDocId - numDocsDeletedOrFailed}} is actually really brittle, 
and will cause lots of headaches if anyone tries to add more docs/days to the 
test because of how "router.1.autoDeleteAge" is used
 ** ideally: change {{assertCatTimeInvariants()}} to take the expected number 
of docs per collection (as a map) instead of just a list a list of the expected 
collection names
 *** that way we can assert the specified number of docs exist in _each_ 
collection (ie: don't just checks that the docs we do find have field values 
that match the collection name)
 *** but first: use the sum of all values in that map to assert the total 
number of docs found in the collection

> DimensionalRoutedAliasUpdateProcessorTest / RoutedAliasUpdateProcessorTest 
> failures due commitWithin/openSearcher delays
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13696
>                 URL: https://issues.apache.org/jira/browse/SOLR-13696
>             Project: Solr
>          Issue Type: Test
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Gus Heck
>            Priority: Major
>         Attachments: thetaphi_Lucene-Solr-8.x-MacOSX_272.log.txt
>
>
> Recent jenkins failure...
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-MacOSX/272/
> Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC
> {noformat}
> Stack Trace:
> java.lang.AssertionError: expected:<16> but was:<15>
>         at 
> __randomizedtesting.SeedInfo.seed([DB6DC28D5560B1D2:E295833E1541FDB9]:0)
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.failNotEquals(Assert.java:834)
>         at org.junit.Assert.assertEquals(Assert.java:645)
>         at org.junit.Assert.assertEquals(Assert.java:631)
>         at
> org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.assertCatTimeInvariants(DimensionalRoutedAliasUpdateProcessorTest.java:677
> )
>         at 
> org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.testTimeCat(DimensionalRoutedAliasUpdateProcessorTest.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> {noformat}
> Digging into the logs, the problem appears to be in the way the test 
> verifies/assumes docs have been committed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to