[ https://issues.apache.org/jira/browse/SOLR-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man updated SOLR-10109: ---------------------------- Attachment: SoftAutoCommitTest.2017-02-25.2938.log.txt SoftAutoCommitTest.2017-05-02.3983.log.txt [~markrmil...@gmail.com]: Since your commit in feb, the number of jenkins failures for SoftAutoCommitTest has dropped noticeably -- well done! There have only been 2 jenkins failures, since then -- I don't have the full logs, but i'm attaching the bodies of the jenkins emails. In both cases, the failure was the same assertion in {{testSoftAndHardCommitMaxTimeRapidAdds}}... {noformat} java.lang.AssertionError: 2: hard wasn't fast enough at __randomizedtesting.SeedInfo.seed([FF159C05808D08C6:A300323C6B0F49BE]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.apache.solr.update.SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds(SoftAutoCommitTest.java:382) {noformat} If you look at the timing info from the logs, and do some simple math, the problem seems clear... {noformat} [junit4] 2> 2824567 INFO (TEST-SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds-seed#[FF159C05808D08C6]) [ ] o.a.s.SolrTestCaseJ4 ###Starting testSoftAndHardCommitMaxTimeRapidAdds [junit4] 2> 2832001 INFO (TEST-SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds-seed#[FF159C05808D08C6]) [ ] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=null path=null params={}{add=[5000 (1566326219147837440)]} 0 2424 [junit4] 2> 2832002 INFO (TEST-SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds-seed#[FF159C05808D08C6]) [ ] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=null path=null params={}{add=[5001 (1566326221690634240)]} 0 0 [junit4] 2> 2832003 INFO (TEST-SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds-seed#[FF159C05808D08C6]) [ ] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=null path=null params={}{add=[5002 (1566326221690634241)]} 0 0 [junit4] 2> 2832003 INFO (TEST-SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds-seed#[FF159C05808D08C6]) [ ] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=null path=null params={}{add=[5003 (1566326221691682816)]} 0 0 [junit4] 2> 2832004 INFO (TEST-SoftAutoCommitTest.testSoftAndHardCommitMaxTimeRapidAdds-seed#[FF159C05808D08C6]) [ ] o.a.s.u.p.LogUpdateProcessorFactory [collection1] webapp=null path=null params={}{add=[5004 (1566326221691682817)]} 0 0 [junit4] 2> 2832506 INFO (commitScheduler-7474-thread-1) [ ] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} [junit4] 2> 2832509 INFO (commitScheduler-7474-thread-1) [ ] o.a.s.s.SolrIndexSearcher Opening [Searcher@1d85065c[collection1] main] [junit4] 2> 2832509 INFO (commitScheduler-7474-thread-1) [ ] o.a.s.u.DirectUpdateHandler2 end_commit_flush [junit4] 2> 2832511 INFO (searcherExecutor-7470-thread-1) [ ] o.a.s.c.SolrCore [collection1] Registered new searcher Searcher@1d85065c[collection1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_1(7.0.0):c1) Uninverting(_2(7.0.0):c1) Uninverting(_3(7.0.0):c1) Uninverting(_4(7.0.0):c5)))} [junit4] 2> 2833207 INFO (commitScheduler-7473-thread-1) [ ] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} {noformat} If you compare the timstamp when the first doc *finished* being added (2832001) with the timstamp of when the hard commit *started* (2833207) then that's almost exactly the 1200ms hardCommitWaitMillis ({{2833207 - 2832001 = 1206}}) and it's understandable that only 1 hard commit happened since all of the docs were added in under 12 seconds. But the test requires the {{fast5start}} timestamp _before_ adding the first document, and as we can see from the {{add=[5000 (1566326219147837440)]} 0 2424}} log line that request was stalled for 2.4seconds -- which probably explains why the test thread thinks the 5 adds took longer then 12seconds and expects more then 1 hardCommit. (it's not clear since the test does actaully log fast5start or fast5end) I think the right fix is to do something like this... {code} // record pre/post times for start/stop to account for flux in when add // recorded by commit tracker for first/last doc in the set final long preStart = System.nanoTime(); assertU(adoc("id",5000)); final long postStart = System.nanoTime(); assertU(adoc("id",5001)); assertU(adoc("id",5002)); assertU(adoc("id",5003)); final long preEnd = System.nanoTime(); assertU(adoc("id",5004)); final long postEnd = System.nanoTime(); final long minTime = preEnd - postStart; final long maxTime = postEnd - preStart; // NOTE: explicitly useing truncated division of longs to round down for "min" values final long minExpectedSoft = Math.max(1, minTime / softCommitWaitMillis); final long minExpectedHard = Math.max(1, minTime / hardCommitWaitMillis); // round up for "max" values final long maxExpectedSoft = Math.max(1, Math.ceil(((double)maxTime) / softCommitWaitMillis); final long maxExpectedHard = Math.max(1, Math.ceil(((double)maxTime) / hardCommitWaitMillis); // TODO.... // now assert we can poll each type for the minExpectedFoo and not get nulls // then accept that we *might* have up to 'maxExpectedFoo-minExpectedFoo' non-null values as well {code} Unless I'm missing something, that would preserve the spirit of the test, and account for possible slow machine lag as well. I'll start working on a patch (and include some nocommit random sleeps to help test for the possibility of this lag) but in the meantime if you see any flaws in this approach please ping me ASAP. > SoftAutoCommitTest is too fragile. > ---------------------------------- > > Key: SOLR-10109 > URL: https://issues.apache.org/jira/browse/SOLR-10109 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Mark Miller > Assignee: Mark Miller > Attachments: SoftAutoCommitTest.2017-02-25.2938.log.txt, > SoftAutoCommitTest.2017-05-02.3983.log.txt > > > In my beast test runs it has failed 7% of the time, then 3% of the time, then > when I started using hardware with 8 cores instead of 16, 40% of the time > (unless something else also shifted). That gives it a test rating of 'screwy'. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org