Re: Test failures are out of control......

Erick Erickson Wed, 21 Feb 2018 10:09:01 -0800

Dawid:
Yep, definitely a recurring theme. But this time I may actually, you
know, do something about it ;)


Mark is one of the advocates of this theme, perhaps he got exhausted
trying to push that stone up the hill ;). Maybe it's my turn to pick
up the baton.... Comments about there being value to seeing these is
well taken, but outweighed IMO by the harm in there being so much
noise that failures that _should_ get attention are so easy to
overlook.

bq: The noise in Solr tests have increased to a degree that I stopped looking.

Exactly. To one degree or another I think this has happened to a _lot_
of people, myself certainly included.

And you've certainly done more than your share of fixing things in the
infrastructure, many thanks!

----------------

I'm not sure blanket @BadApple-ing these is The Right Thing To Do for
_all_ of them though as I know lots of active work is being done in
some areas. I'd hate for someone to be working in some area and
currently trying to fix something and have the  failures disappear and
think they were fixed when in reality they just weren't run.

Straw-man proposal:

> I'll volunteer to gather failing tests through the next few days from the dev 
> e-mails. I'll create yet another umbrella JIRA that proposes to @BadApple 
> _all_ of them unless someone steps up and volunteers to actively work on a 
> particular test failure. Since I brought it up I'll get aggressive about 
> @BadApple-ing failing tests in future. I'll link the current JIRAs for 
> failing tests in as well (on a cursory glance there are 16 open ones)...

> If someone objects to @BadApple-ing a particular test, they should create a 
> JIRA, assign it to themselves and actively work on it. Short shrift given to 
> "I don't think we should @BadApple that test because someday someone might 
> want to try to fix it".... In this proposal, it's perfectly acceptable to 
> remove the @BadApple notation and push it, as long as it's being actively 
> worked on.

> Would someone who knows the test infrastructure better than me be willing to 
> volunteer to set up a run periodically with BadApple annotations disabled. 
> Perhaps weekly? Even nightly? That way interested parties can see these but 
> the rest of us would only have _one_ e-mail to ignore, not 10-20 a day. It'd 
> be great if the subject mentioned something allowing the WithBadApple runs to 
> be identified just by glancing at the subject..... Then errors that didn't 
> have BadApple annotations would stand out from the noise since they would be 
> in _other_ emails.

> It's easy enough to find all the BadApple-labeled tests, I'll also volunteer 
> to post a weekly list.

Getting e-mails for flakey tests is acceptable IMO only if people are
working on them. I've certainly been in situations where I can't get
something to fail locally and have to rely on Jenkins etc to gather
logging info or see if my fixes really work. I _do_ care that we are
accumulating more and more failures and it's getting harder and harder
to know when failures are a function of new code or not.

WDYT?
Erick

On Wed, Feb 21, 2018 at 8:36 AM, Tommaso Teofili
<tommaso.teof...@gmail.com> wrote:
> +1, agree with Adrien, thanks for bringing this up Erick!
>
>
>
> Il giorno mer 21 feb 2018 alle ore 17:15 Adrien Grand <jpou...@gmail.com> ha
> scritto:
>>
>> Thanks for bringing this up Erick. I agree with you we should silence
>> those frequent failures. Like you said, the side-effects of not silencing
>> them are even worse. I'll add that these flaky tests also make releasing
>> harder, it took me three runs last time (Lucene/Solr 7.2) for the release
>> build to succeed because of failed tests.
>>
>> Le mer. 21 févr. 2018 à 16:52, Erick Erickson <erickerick...@gmail.com> a
>> écrit :
>>>
>>> There's an elephant in the room, and it's that failing tests are being
>>> ignored. Mind you, Solr and Lucene are progressing at a furious pace
>>> with lots of great functionality being added. That said, we're
>>> building up a considerable "technical debt" when it comes to testing.
>>>
>>> And I should say up front that major new functionality is expected to
>>> take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
>>> and noise from tests of new functionality is expected while things
>>> bake.
>>>
>>> Below is a list of tests that have failed at least once since just
>>> last night. This has been getting worse as time passes, the broken
>>> window problem. Some e-mails have 10 failing tests (+/-) so unless I
>>> go through each and every one I don't know whether something I've done
>>> is causing a problem or not.
>>>
>>> I'm as guilty of letting things slide as anyone else, there's been a
>>> long-standing issue with TestLazyCores that I work on sporadically for
>>> instance that's _probably_ "something in the test framework"....
>>>
>>> Several folks have spent some time digging into test failures and
>>> identifying at least some of the causes, kudos to them. It seems
>>> they're voices crying out in the wilderness though.
>>>
>>> There is so much noise at this point that tests are becoming
>>> irrelevant. I'm trying to work on SOLR-10809 for instance, where
>>> there's a pretty good possibility that I'll close at least one thing
>>> that shouldn't be closed. So I ran the full suite 10 times and
>>> gathered all the failures. Now I have to try to separate the failures
>>> caused by that JIRA from the ones that aren't related to it so I beast
>>> each of the failing tests 100 times against master. If I get a failure
>>> on master too for a particular test, I'll assume it's "not my problem"
>>> and drive on.
>>>
>>> I freely acknowledge that this is poor practice. It's driven by
>>> frustration and the desire to make progress. While it's poor practice,
>>> it's not as bad as only looking at tests that I _think_ are related or
>>> ignoring all tests failures I can't instantly recognize as "my fault".
>>>
>>> So what's our stance on this? Mark Miller had a terrific program at
>>> one point allowing categorization of tests that failed at a glance,
>>> but it hasn't been updated in a while.  Steve Rowe is working on the
>>> problem too. Hoss and Cassandra have both added to the efforts as
>>> well. And I'm sure I'm leaving out others.
>>>
>>> Then there's the @Ignore and @BadApple annotations....
>>>
>>> So, as a community, are we going to devote some energy to this? Or
>>> shall we just start ignoring all of the frequently failing tests?
>>> Frankly we'd be farther ahead at this point marking failing tests that
>>> aren't getting any work with @Ignore or @BadApple and getting
>>> compulsive about not letting any _new_ tests fail than continuing our
>>> current path. I don't _like_ this option mind you, but it's better
>>> than letting these accumulate forever and tests become more and more
>>> difficult to use. As tests become more difficult to use, they're used
>>> less and the problem gets worse.
>>>
>>> Note, I made no effort to separate suite .vs. individual reports
>>> here.....
>>>
>>> Erick
>>>
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
>>> FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
>>> FAILED:  org.apache.lucene.index.TestStressNRT.test
>>> FAILED:  org.apache.solr.cloud.AddReplicaTest.test
>>> FAILED:  org.apache.solr.cloud.DeleteShardTest.test
>>> FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
>>> FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
>>> FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
>>> FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
>>> FAILED:
>>> org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
>>> FAILED:
>>> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
>>> FAILED:
>>> org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Test failures are out of control......

Reply via email to