Alexandre: Feel free! What I'm struggling with is not that someone checked in some code that all the sudden started breaking things. Rather that a test that's been working perfectly will fail once the won't reproducibly fail again and does _not_ appear to be related to recent code changes.
In fact that's the crux of the matter, it's difficult/impossible to tell at a glance when a test fails whether it is or is not related to a recent code change..... Erick On Wed, Aug 1, 2018 at 8:05 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Just a completely random thought that I do not have deep knowledge for > (still learning my way around Solr tests). > > Is this something that Machine Learning could help with? The Github > repo/history is a fantastic source of learning on who worked on which > file, how often, etc. We certainly should be able to get some 'most > significant developer' stats out of that. > > Regards, > Alex. > > On 1 August 2018 at 10:56, Erick Erickson <erickerick...@gmail.com> wrote: >> Shawn: >> >> Trouble is there were 945 tests that failed at least once in the last >> 4 weeks. And the trend is all over the map on a weekly basis. >> >> e-mail-2018-06-11.txt: There were 989 unannotated tests that failed >> e-mail-2018-06-18.txt: There were 689 unannotated tests that failed >> e-mail-2018-06-25.txt: There were 555 unannotated tests that failed >> e-mail-2018-07-02.txt: There were 723 unannotated tests that failed >> e-mail-2018-07-09.txt: There were 793 unannotated tests that failed >> e-mail-2018-07-16.txt: There were 809 unannotated tests that failed >> e-mail-2018-07-23.txt: There were 953 unannotated tests that failed >> e-mail-2018-07-30.txt: There were 945 unannotated tests that failed >> >> I'm BadApple'ing tests that fail every week for the last 4 weeks on >> the theory that those are not temporary issues (hey, we all commit >> code that breaks something then have to figure out why and fix). >> >> I also have the feeling that somewhere, somehow, our test framework is >> making some assumptions that are invalid. Or too strict. Or too fast. >> Or there's some fundamental issue with some of our classes. Or... The >> number of sporadic issues where the Object Tracker spits stuff out for >> instance screams that some assumption we're making, either in the code >> or in the test framework is flawed. >> >> What I don't know is how to make visible progress. It's discouraging >> to fix something and then next week have more tests fail for unrelated >> reasons. >> >> Visibility is the issue to me. We have no good way of saying "these >> tests _just started failing for a reason. As a quick experiment, I >> extended the triage to 10 weeks (no attempt to ascertain if these >> tests even existed 10 weeks ago). Here are the tests that have _only_ >> failed in the last week, not the previous 9. BadApple'ing anything >> that's only failed once seems overkill >> >> Although the test that failed 77 times does just stand out.... >> >> week pct runs fails test >> 0 0.2 460 1 >> CloudSolrClientTest.testVersionsAreReturned >> 0 0.2 466 1 >> ComputePlanActionTest.testSelectedCollections >> 0 0.2 464 1 >> ConfusionMatrixGeneratorTest.testGetConfusionMatrixWithBM25NB >> 0 8.1 37 3 IndexSizeTriggerTest(suite) >> 0 0.2 454 1 MBeansHandlerTest.testAddedMBeanDiff >> 0 0.2 454 1 MBeansHandlerTest.testDiff >> 0 0.2 455 1 MetricTriggerTest.test >> 0 0.2 455 1 MetricsHandlerTest.test >> 0 0.2 455 1 MetricsHandlerTest.testKeyMetrics >> 0 0.2 453 1 RequestHandlersTest.testInitCount >> 0 0.2 453 1 RequestHandlersTest.testStatistics >> 0 0.2 453 1 ScheduledTriggerIntegrationTest(suite) >> 0 0.2 451 1 >> SearchRateTriggerTest.testWaitForElapsed >> 0 0.2 425 1 >> SoftAutoCommitTest.testSoftCommitWithinAndHardCommitMaxTimeRapidAdds >> 0 14.7 525 77 >> StreamExpressionTest.testSignificantTermsStream >> 0 0.2 454 1 TestBadConfig(suite) >> 0 0.2 465 1 >> TestBlockJoin.testMultiChildQueriesOfDiffParentLevels >> 0 0.6 462 3 >> TestCloudCollectionsListeners.testCollectionDeletion >> 0 0.2 456 1 TestInfoStreamLogging(suite) >> 0 0.2 456 1 TestLazyCores.testLazySearch >> 0 0.2 473 1 >> TestLucene70DocValuesFormat.testSortedSetAroundBlockSize >> 0 15.4 26 4 >> TestMockDirectoryWrapper.testThreadSafetyInListAll >> 0 0.2 454 1 TestNodeLostTrigger.testTrigger >> 0 0.2 453 1 TestRecovery.stressLogReplay >> 0 0.2 505 1 >> TestReplicationHandler.testRateLimitedReplication >> 0 0.2 425 1 >> TestSolrCloudWithSecureImpersonation.testForwarding >> 0 0.9 461 4 >> TestSolrDeletionPolicy1.testNumCommitsConfigured >> 0 0.2 454 1 TestSystemIdResolver(suite) >> 0 0.2 451 1 TestV2Request.testCloudSolrClient >> 0 0.2 451 1 TestV2Request.testHttpSolrClient >> 0 9.1 77 7 >> TestWithCollection.testDeleteWithCollection >> 0 3.9 77 3 >> TestWithCollection.testMoveReplicaWithCollection >> >> So I don't know what I'm going to do here, we'll see if I get more >> optimistic when the fog lifts. >> >> Erick >> >> On Wed, Aug 1, 2018 at 7:15 AM, Shawn Heisey <apa...@elyograg.org> wrote: >>> On 7/30/2018 11:52 AM, Erick Erickson wrote: >>>> >>>> Is anybody paying the least attention to this or should I just stop >>>> bothering? >>> >>> >>> The job you're doing is thankless. That's the nature of the work. I'd love >>> to have the time to really help you out. If only my employer didn't expect >>> me to spend so much time *working*! >>> >>>> I'd hoped to get to a point where we could get at least semi-stable >>>> and start whittling away at the backlog. But with an additional 63 >>>> tests to BadApple (a little fudging here because of some issues with >>>> counting suite-level tests .vs. individual test) it doesn't seem like >>>> we're going in the right direction at all. >>>> >>>> Unless there's some value here, defined by people stepping up and at >>>> least looking (and once a week is not asking too much) at the names of >>>> the tests I'm going to BadApple to see if they ring any bells, I'll >>>> stop wasting my time. >>> >>> >>> Here's a crazy thought, which might be something you already considered: >>> Try to figure out which tests pass consistently and BadApple *all the rest* >>> of the Solr tests. If there are any Lucene tests that fail with some >>> regularity, BadApple those too. >>> >>> There are probably disadvantages to this approach, but here are the >>> advantages I can think of: 1) The noise stops quickly. 2) Future heroic >>> efforts will result in measurable progress -- to quote you, "whittling away >>> at the backlog." >>> >>> Thank you a million times over for all the care and effort you've put into >>> this. >>> >>> Shawn >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org