I was thinking of Cassandra’s reply on that thread: https://lists.apache.org/thread.html/f8d84a669fc009429fcc51873fdd36b1e2b7f6c44b2e7abd9d8cf4fa@%3Cdev.lucene.apache.org%3E , which I’ll quote here:
> From: Cassandra Targett <[email protected]> > To: [email protected] > Subject: Re: Test failures are out of control...... > Date: 2018/02/21 23:13:15 > List: [email protected] > > This issue is hugely important. > > At Lucidworks we have implemented a "Test Confidence" role that focuses on > improving the ability of all members of the community to trust that reported > failures from any of the Jenkins systems are actual failures and not flakey > tests. This role rotates among the committers on our Solr Team, and a > committer is assigned to the role for 2-week periods of time. Our goal is to > have at least one committer on our team focused full-time on improving test > confidence at all times. (Just a note on timing, we started this last summer, > but we only recently reconfirmed our commitment to having someone assigned to > it at all times.) > > One of the guidelines we've agreed to is that the person in the role should > not look (only) at tests he has worked on. Instead, he should focus on tests > that fail less than 100% of the time and/or are hard to reproduce *even if he > didn't write the test or the code*. > > Another aspect of the Test Confidence role is to try to develop tools that > can help the community overall in improving this situation. Two things have > grown out of this effort so far: > > * Steve Rowe's work on a Jenkins job to reproduce test failures (LUCENE-8106) > * Hoss has worked on aggregating all test failures from the 3 Jenkins systems > (ASF, Policeman, and Steve's), downloading the test results & logs, and > running some reports/stats on failures. He should be ready to share this more > publicly soon. > > I think it's important to understand that flakey tests will *never* go away. > There will always be a new flakey test to review/fix. Our goal should be to > make it so most of the time, you can assume the test is broken and only > discover it's flakey as part of digging. > > The idea of @BadApple marking (or some other notation) is an OK idea, but the > problem is so bad today I worry it does nothing to find a way to ensure they > get fixed. Lots of JIRAs get filed for problems with tests - I count about > 180 open issues today - and many just sit there forever. > > The biggest thing I want to to avoid is making it even easier to avoid/ignore > them. We should try to make it easier to highlight them, and we need a > concerted effort to fix the tests once they've been identified as flakey. > -- Steve www.lucidworks.com > On Jun 19, 2018, at 11:15 AM, Simon Willnauer <[email protected]> > wrote: > > Hi steve, I saw and followed that thread but the only outcome that I > can see it stuff being bad appled? I might miss something and I can go > and argue on specifics on that thread like: > >> Testing distributed systems requires, well, distributed systems which is >> what starting clusters is all about. > > which I have worked on for several years and I am convinced it's a > false statement. I didn't wanna go down that route which I think boils > down to the cultural disconnect. If I missed anything that is answered > I am sorry I will go and re-read it. > > simon > > On Tue, Jun 19, 2018 at 4:29 PM, Steve Rowe <[email protected]> wrote: >> Hi Simon, >> >> Have you seen the late-February thread “Test failures are out of control….”? >> : >> https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E >> >> If not, I suggest you go take a look. Some of your questions are answered >> there. >> >> -- >> Steve >> www.lucidworks.com >> >>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[email protected]> >>> wrote: >>> >>> Thanks folks, I appreciate you are sharing some thoughts about this. My >>> biggest issue is that this is a permanent condition. I could have sent this >>> mail 2, 4 or 6 years ago and it would have been as relevant as today. >>> >>> I am convinced mark can make some progress but this isn't fixable by a >>> single person this is a structural problem or rather a cultural. I am not >>> sure if everybody is aware of how terrible it is. I took a screenshot of my >>> inbox the other day what I have to dig through on a constant basis >>> everytime I commit a change to lucene to make sure I am not missing >>> something. >>> >>> <image.png> >>> >>> I don't even know how we can attract any new contributors or how many >>> contributors have been scared away by this in the past. This is not good >>> and bad-appeling these test isn't the answer unless we put a lot of effort >>> into it, sorry I don't see it happening. I would have expected more than >>> like 4 people from this PMC to reply to something like this. From my >>> perspective there is a lot of harm done by this to the project and we have >>> to figure out what we wanna do. This also affects our ability to release, >>> guys our smoke-test builds never pass [1]. I don't know what to do if I >>> were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is >>> serious and what not on a solr build. It's also not just be smoke tester >>> it's basically everything that runs after solr that is skipped on a regular >>> basis. >>> >>> I don't have a good answer but we have to get this under control it's >>> burdensome for lucene to carry this load and it's carrying it a quite some >>> time. It wasn't very obvious how big this weights since I wasn't working on >>> lucene internals for quite a while and speaking to many folks around here >>> this is on their shoulders but it's not brought up for discussion, i think >>> we have to. >>> >>> simon >>> >>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/ >>> >>> >>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[email protected]> >>> wrote: >>> Martin: >>> >>> I have no idea how logging severity levels apply to unit tests that fail. >>> It's not a question of triaging logs, it's a matter of Jenkins junit test >>> runs reporting failures. >>> >>> >>> >>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[email protected]> wrote: >>> Erick- >>> >>> appears that style mis-application may be categorised as INFO >>> are mixed in with SEVERE errors >>> >>> Would it make sense to filter the errors based on severity ? >>> >>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html >>> Level (Java Platform SE 7 ) - Oracle Help Center >>> docs.oracle.com >>> The Level class defines a set of standard logging levels that can be used >>> to control logging output. The logging Level objects are ordered and are >>> specified by ordered integers. >>> if you know Severity you can triage the SEVERE errors before working down >>> to INFO errors >>> >>> >>> WDYT? >>> Martin >>> ______________________________________________ >>> >>> >>> >>> From: Erick Erickson <[email protected]> >>> Sent: Friday, June 15, 2018 1:05 PM >>> To: [email protected]; Mark Miller >>> Subject: Re: Status of solr tests >>> >>> Mark (and everyone). >>> >>> I'm trying to be somewhat conservative about what I BadApple, at this >>> point it's only things that have failed every week for the last 4. >>> Part of that conservatism is to avoid BadApple'ing tests that are >>> failing and _should_ fail. >>> >>> I'm explicitly _not_ delving into any of the causes at all at this >>> point, it's overwhelming until we reduce the noise as everyone knows. >>> >>> So please feel totally free to BadApple anything you know is flakey, >>> it won't intrude on my turf ;) >>> >>> And since I realized I can also report tests that have _not_ failed in >>> a month that _are_ BadApple'd, we can be a little freer with >>> BadApple'ing tests since there's a mechanism for un-annotating them >>> without a lot of tedious effort. >>> >>> FWIW. >>> >>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[email protected]> wrote: >>>> There is an okay chance I'm going to start making some improvements here as >>>> well. I've been working on a very stable set of tests on my starburst >>>> branch >>>> and will slowly bring in test fixes over time (I've already been making >>>> some >>>> on that branch for important tests). We should currently be defaulting to >>>> tests.badapples=false on all solr test runs - it's a joke to try and get a >>>> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat >>>> commonly have so far avoided Erick's @BadApple hack and slash. They are bad >>>> appled on my dev branch now, but that is currently where any time I have is >>>> spent rather than on the main dev branches. >>>> >>>> Also, too many flakey tests are introduced because devs are not beasting or >>>> beasting well before committing new heavy tests. Perhaps we could add some >>>> docs around that. >>>> >>>> We have built in beasting support, we need to emphasize that a couple >>>> passes >>>> on a new test is not sufficient to test it's quality. >>>> >>>> - Mark >>>> >>>> On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[email protected]> >>>> wrote: >>>>> >>>>> (Siiiiggggghhhh) All very true. You're not alone in your frustration. >>>>> >>>>> I've been trying to at least BadApple tests that fail consistently, so >>>>> another option could be to disable BadApple'd tests. My hope has been >>>>> to get to the point of being able to reliably get clean runs, at least >>>>> when BadApple'd tests are disabled. >>>>> >>>>> From that point I want to draw a line in the sand and immediately >>>>> address tests that fail that are _not_ BadApple'd. At least then we'll >>>>> stop getting _worse_. And then we can work on the BadApple'd tests. >>>>> But as David says, that's not going to be any time soon. It's been a >>>>> couple of months that I've been trying to just get the tests >>>>> BadApple'd without even trying to fix any of them. >>>>> >>>>> It's particularly pernicious because with all the noise we don't see >>>>> failures we _should_ see. >>>>> >>>>> So I don't have any good short-term answer either. We've built up a >>>>> very large technical debt in the testing. The first step is to stop >>>>> adding more debt, which is what I've been working on so far. And >>>>> that's the easy part.... >>>>> >>>>> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh >>>>> >>>>> Erick >>>>> >>>>> >>>>> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[email protected]> >>>>> wrote: >>>>>> (Sigh) I sympathize with your points Simon. I'm +1 to modify the >>>>>> Lucene-side JIRA QA bot (Yetus) to not execute Solr tests. We can and >>>>>> are >>>>>> trying to improve the stability of the Solr tests but even >>>>>> optimistically >>>>>> the practical reality is that it won't be good enough anytime soon. >>>>>> When we >>>>>> get there, we can reverse this. >>>>>> >>>>>> On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer >>>>>> <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> folks, >>>>>>> >>>>>>> I got more active working on IndexWriter and Soft-Deletes etc. in the >>>>>>> last couple of weeks. It's a blast again and I really enjoy it. The >>>>>>> one thing that is IMO not acceptable is the status of solr tests. I >>>>>>> tried so many times to get them passing on several different OSs but >>>>>>> it seems this is pretty hopepless. It's get's even worse the >>>>>>> Lucene/Solr QA job literally marks every ticket I attach a patch to as >>>>>>> `-1` because of arbitrary solr tests, here is an example: >>>>>>> >>>>>>> || Reason || Tests || >>>>>>> | Failed junit tests | solr.rest.TestManagedResourceStorage | >>>>>>> | | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest | >>>>>>> | | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest | >>>>>>> | | solr.client.solrj.impl.CloudSolrClientTest | >>>>>>> | | solr.common.util.TestJsonRecordReader | >>>>>>> >>>>>>> Speaking to other committers I hear we should just disable this job. >>>>>>> Sorry, WTF? >>>>>>> >>>>>>> These tests seem to fail all the time, randomly and over and over >>>>>>> again. This renders the test as entirely useless to me. I even invest >>>>>>> time (wrong, I invested) looking into it if they are caused by me or >>>>>>> if I can do something about it. Yet, someone could call me out for >>>>>>> being responsible for them as a commiter, yes I am hence this email. I >>>>>>> don't think I am obliged to fix them. These projects have 50+ >>>>>>> committers and having a shared codebase doesn't mean everybody has to >>>>>>> take care of everything. I think we are at the point where if I work >>>>>>> on Lucene I won't run solr tests at all otherwise there won't be any >>>>>>> progress. On the other hand solr tests never pass I wonder if the solr >>>>>>> code-base gets changes nevertheless? That is again a terrible >>>>>>> situation. >>>>>>> >>>>>>> I spoke to varun and anshum during buzzwords if they can give me some >>>>>>> hints what I am doing wrong but it seems like the way it is. I feel >>>>>>> terrible pushing stuff to our repo still seeing our tests fail. I get >>>>>>> ~15 build failures from solr tests a day I am not the only one that >>>>>>> has mail filters to archive them if there isn't a lucene tests in the >>>>>>> failures. >>>>>>> >>>>>>> This is a terrible state folks, how do we fix it? It's the lucene land >>>>>>> that get much love on the testing end but that also requires more work >>>>>>> on it, I expect solr to do the same. That at the same time requires >>>>>>> stop pushing new stuff until the situation is under control. The >>>>>>> effort of marking stuff as bad apples isn't the answer, this requires >>>>>>> effort from the drivers behind this project. >>>>>>> >>>>>>> simon >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>> For additional commands, e-mail: [email protected] >>>>>>> >>>>>> -- >>>>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >>>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >>> >>> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | >>> LinkedIn >>> linkedin.com >>> View David Smiley’s profile on LinkedIn, the world's largest professional >>> community. David has 3 jobs listed on their profile. See the complete >>> profile on LinkedIn and discover David’s connections and jobs at similar >>> companies. >>> >>> >>>>>> http://www.solrenterprisesearchserver.com >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>> -- >>>> - Mark >>>> about.me/markrmiller >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
