Re: BadApple report. Seems like I'm wasting my time.

Mark Miller Wed, 01 Aug 2018 09:06:16 -0700

I still think it’s a mistake to try and use all the Jenkins results to
drive ignoring tests. It needs to be an objective measure in a good env.


We also should not be ignoring tests in mass.l without individual
consideration. Critical test coverage should be treated differently than
any random test, especially when stability is sometimes simple to achieve
for that test.

A decade+ of history says it’s unlikely you get much consistent help
digging out of a huge test ignore hell.

Beasting in a known good environment and a few very interested parties is
the only path out of this if you ask me. We need to get clean in a known
good env and then automate beasting defense, using Jenkins to find issues
in other environments.

Unfortunately, not something I can help out with in the short term anymore.

Mark
On Wed, Aug 1, 2018 at 8:10 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> Alexandre:
>
> Feel free! What I'm struggling with is not that someone checked in
> some code that all the sudden started breaking things. Rather that a
> test that's been working perfectly will fail once the won't
> reproducibly fail again and does _not_ appear to be related to recent
> code changes.
>
> In fact that's the crux of the matter, it's difficult/impossible to
> tell at a glance when a test fails whether it is or is not related to
> a recent code change.....
>
> Erick
>
> On Wed, Aug 1, 2018 at 8:05 AM, Alexandre Rafalovitch
> <arafa...@gmail.com> wrote:
> > Just a completely random thought that I do not have deep knowledge for
> > (still learning my way around Solr tests).
> >
> > Is this something that Machine Learning could help with? The Github
> > repo/history is a fantastic source of learning on who worked on which
> > file, how often, etc. We certainly should be able to get some 'most
> > significant developer' stats out of that.
> >
> > Regards,
> >    Alex.
> >
> > On 1 August 2018 at 10:56, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >> Shawn:
> >>
> >> Trouble is there were 945 tests that failed at least once in the last
> >> 4 weeks. And the trend is all over the map on a weekly basis.
> >>
> >> e-mail-2018-06-11.txt: There were 989 unannotated tests that failed
> >> e-mail-2018-06-18.txt: There were 689 unannotated tests that failed
> >> e-mail-2018-06-25.txt: There were 555 unannotated tests that failed
> >> e-mail-2018-07-02.txt: There were 723 unannotated tests that failed
> >> e-mail-2018-07-09.txt: There were 793 unannotated tests that failed
> >> e-mail-2018-07-16.txt: There were 809 unannotated tests that failed
> >> e-mail-2018-07-23.txt: There were 953 unannotated tests that failed
> >> e-mail-2018-07-30.txt: There were 945 unannotated tests that failed
> >>
> >> I'm BadApple'ing tests that fail every week for the last 4 weeks on
> >> the theory that those are not temporary issues (hey, we all commit
> >> code that breaks something then have to figure out why and fix).
> >>
> >> I also have the feeling that somewhere, somehow, our test framework is
> >> making some assumptions that are invalid. Or too strict. Or too fast.
> >> Or there's some fundamental issue with some of our classes. Or... The
> >> number of sporadic issues where the Object Tracker spits stuff out for
> >> instance screams that some assumption we're making, either in the code
> >> or in the test framework is flawed.
> >>
> >> What I don't know is how to make visible progress. It's discouraging
> >> to fix something and then next week have more tests fail for unrelated
> >> reasons.
> >>
> >> Visibility is the issue to me. We have no good way of saying "these
> >> tests _just started failing for a reason. As a quick experiment, I
> >> extended the triage to 10 weeks (no attempt to ascertain if these
> >> tests even existed 10 weeks ago). Here are the tests that have _only_
> >> failed in the last week, not the previous 9. BadApple'ing anything
> >> that's only failed once seems overkill
> >>
> >> Although the test that failed 77 times does just stand out....
> >>
> >> week     pct        runs  fails            test
> >> 0            0.2      460      1
> >> CloudSolrClientTest.testVersionsAreReturned
> >> 0            0.2      466      1
> >> ComputePlanActionTest.testSelectedCollections
> >> 0            0.2      464      1
> >> ConfusionMatrixGeneratorTest.testGetConfusionMatrixWithBM25NB
> >> 0            8.1       37      3      IndexSizeTriggerTest(suite)
> >> 0            0.2      454      1
> MBeansHandlerTest.testAddedMBeanDiff
> >> 0            0.2      454      1      MBeansHandlerTest.testDiff
> >> 0            0.2      455      1      MetricTriggerTest.test
> >> 0            0.2      455      1      MetricsHandlerTest.test
> >> 0            0.2      455      1      MetricsHandlerTest.testKeyMetrics
> >> 0            0.2      453      1      RequestHandlersTest.testInitCount
> >> 0            0.2      453      1      RequestHandlersTest.testStatistics
> >> 0            0.2      453      1
> ScheduledTriggerIntegrationTest(suite)
> >> 0            0.2      451      1
> SearchRateTriggerTest.testWaitForElapsed
> >> 0            0.2      425      1
> >> SoftAutoCommitTest.testSoftCommitWithinAndHardCommitMaxTimeRapidAdds
> >> 0           14.7      525     77
> >> StreamExpressionTest.testSignificantTermsStream
> >> 0            0.2      454      1      TestBadConfig(suite)
> >> 0            0.2      465      1
> >> TestBlockJoin.testMultiChildQueriesOfDiffParentLevels
> >> 0            0.6      462      3
> >> TestCloudCollectionsListeners.testCollectionDeletion
> >> 0            0.2      456      1      TestInfoStreamLogging(suite)
> >> 0            0.2      456      1      TestLazyCores.testLazySearch
> >> 0            0.2      473      1
> >> TestLucene70DocValuesFormat.testSortedSetAroundBlockSize
> >> 0           15.4       26      4
> >> TestMockDirectoryWrapper.testThreadSafetyInListAll
> >> 0            0.2      454      1      TestNodeLostTrigger.testTrigger
> >> 0            0.2      453      1      TestRecovery.stressLogReplay
> >> 0            0.2      505      1
> >> TestReplicationHandler.testRateLimitedReplication
> >> 0            0.2      425      1
> >> TestSolrCloudWithSecureImpersonation.testForwarding
> >> 0            0.9      461      4
> >> TestSolrDeletionPolicy1.testNumCommitsConfigured
> >> 0            0.2      454      1      TestSystemIdResolver(suite)
> >> 0            0.2      451      1      TestV2Request.testCloudSolrClient
> >> 0            0.2      451      1      TestV2Request.testHttpSolrClient
> >> 0            9.1       77      7
> >> TestWithCollection.testDeleteWithCollection
> >> 0            3.9       77      3
> >> TestWithCollection.testMoveReplicaWithCollection
> >>
> >> So I don't know what I'm going to do here, we'll see if I get more
> >> optimistic when the fog lifts.
> >>
> >> Erick
> >>
> >> On Wed, Aug 1, 2018 at 7:15 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >>> On 7/30/2018 11:52 AM, Erick Erickson wrote:
> >>>>
> >>>> Is anybody paying the least attention to this or should I just stop
> >>>> bothering?
> >>>
> >>>
> >>> The job you're doing is thankless.  That's the nature of the work.
> I'd love
> >>> to have the time to really help you out. If only my employer didn't
> expect
> >>> me to spend so much time *working*!
> >>>
> >>>> I'd hoped to get to a point where we could get at least semi-stable
> >>>> and start whittling away at the backlog. But with an additional 63
> >>>> tests to BadApple (a little fudging here because of some issues with
> >>>> counting suite-level tests .vs. individual test) it doesn't seem like
> >>>> we're going in the right direction at all.
> >>>>
> >>>> Unless there's some value here, defined by people stepping up and at
> >>>> least looking (and once a week is not asking too much) at the names of
> >>>> the tests I'm going to BadApple to see if they ring any bells, I'll
> >>>> stop wasting my time.
> >>>
> >>>
> >>> Here's a crazy thought, which might be something you already
> considered:
> >>> Try to figure out which tests pass consistently and BadApple *all the
> rest*
> >>> of the Solr tests.  If there are any Lucene tests that fail with some
> >>> regularity, BadApple those too.
> >>>
> >>> There are probably disadvantages to this approach, but here are the
> >>> advantages I can think of:  1) The noise stops quickly. 2) Future
> heroic
> >>> efforts will result in measurable progress -- to quote you, "whittling
> away
> >>> at the backlog."
> >>>
> >>> Thank you a million times over for all the care and effort you've put
> into
> >>> this.
> >>>
> >>> Shawn
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> --
- Mark
about.me/markrmiller

Re: BadApple report. Seems like I'm wasting my time.

Reply via email to