Re: Status of solr tests

Steve Rowe Tue, 19 Jun 2018 08:33:01 -0700

I was thinking of Cassandra’s reply on that thread: 
https://lists.apache.org/thread.html/f8d84a669fc009429fcc51873fdd36b1e2b7f6c44b2e7abd9d8cf4fa@%3Cdev.lucene.apache.org%3E
 , which I’ll quote here:


> From: Cassandra Targett <[email protected]>
> To: [email protected]
> Subject: Re: Test failures are out of control......
> Date: 2018/02/21 23:13:15
> List: [email protected]
> 
> This issue is hugely important.
> 
> At Lucidworks we have implemented a "Test Confidence" role that focuses on 
> improving the ability of all members of the community to trust that reported 
> failures from any of the Jenkins systems are actual failures and not flakey 
> tests. This role rotates among the committers on our Solr Team, and a 
> committer is assigned to the role for 2-week periods of time. Our goal is to 
> have at least one committer on our team focused full-time on improving test 
> confidence at all times. (Just a note on timing, we started this last summer, 
> but we only recently reconfirmed our commitment to having someone assigned to 
> it at all times.)
> 
> One of the guidelines we've agreed to is that the person in the role should 
> not look (only) at tests he has worked on. Instead, he should focus on tests 
> that fail less than 100% of the time and/or are hard to reproduce *even if he 
> didn't write the test or the code*.
> 
> Another aspect of the Test Confidence role is to try to develop tools that 
> can help the community overall in improving this situation. Two things have 
> grown out of this effort so far:
> 
> * Steve Rowe's work on a Jenkins job to reproduce test failures (LUCENE-8106)
> * Hoss has worked on aggregating all test failures from the 3 Jenkins systems 
> (ASF, Policeman, and Steve's), downloading the test results & logs, and 
> running some reports/stats on failures. He should be ready to share this more 
> publicly soon.
> 
> I think it's important to understand that flakey tests will *never* go away. 
> There will always be a new flakey test to review/fix. Our goal should be to 
> make it so most of the time, you can assume the test is broken and only 
> discover it's flakey as part of digging.
> 
> The idea of @BadApple marking (or some other notation) is an OK idea, but the 
> problem is so bad today I worry it does nothing to find a way to ensure they 
> get fixed. Lots of JIRAs get filed for problems with tests - I count about 
> 180 open issues today - and many just sit there forever.
> 
> The biggest thing I want to to avoid is making it even easier to avoid/ignore 
> them. We should try to make it easier to highlight them, and we need a 
> concerted effort to fix the tests once they've been identified as flakey.
> 


--
Steve
www.lucidworks.com

> On Jun 19, 2018, at 11:15 AM, Simon Willnauer <[email protected]> 
> wrote:
> 
> Hi steve, I saw and followed that thread but the only outcome that I
> can see it stuff being bad appled? I might miss something and I can go
> and argue on specifics on that thread like:
> 
>> Testing distributed systems requires, well, distributed systems which is 
>> what starting clusters is all about.
> 
> which I have worked on for several years and I am convinced it's a
> false statement. I didn't wanna go down that route which I think boils
> down to the cultural disconnect. If I missed anything that is answered
> I am sorry I will go and re-read it.
> 
> simon
> 
> On Tue, Jun 19, 2018 at 4:29 PM, Steve Rowe <[email protected]> wrote:
>> Hi Simon,
>> 
>> Have you seen the late-February thread “Test failures are out of control….”? 
>> : 
>> https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E
>> 
>> If not, I suggest you go take a look.  Some of your questions are answered 
>> there.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[email protected]> 
>>> wrote:
>>> 
>>> Thanks folks, I appreciate you are sharing some thoughts about this. My 
>>> biggest issue is that this is a permanent condition. I could have sent this 
>>> mail 2, 4 or 6 years ago and it would have been as relevant as today.
>>> 
>>> I am convinced mark can make some progress but this isn't fixable by a 
>>> single person this is a structural problem or rather a cultural. I am not 
>>> sure if everybody is aware of how terrible it is. I took a screenshot of my 
>>> inbox the other day what I have to dig through on a constant basis 
>>> everytime I commit a change to lucene to make sure I am not missing 
>>> something.
>>> 
>>> <image.png>
>>> 
>>> I don't even know how we can attract any new contributors or how many 
>>> contributors have been scared away by this in the past. This is not good 
>>> and bad-appeling these test isn't the answer unless we put a lot of effort 
>>> into it, sorry I don't see it happening. I would have expected more than 
>>> like 4 people from this PMC to reply to something like this. From my 
>>> perspective there is a lot of harm done by this to the project and we have 
>>> to figure out what we wanna do. This also affects our ability to release, 
>>> guys our smoke-test builds never pass [1]. I don't know what to do if I 
>>> were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is 
>>> serious and what not on a solr build. It's also not just be smoke tester 
>>> it's basically everything that runs after solr that is skipped on a regular 
>>> basis.
>>> 
>>> I don't have a good answer but we have to get this under control it's 
>>> burdensome for lucene to carry this load and it's carrying it a quite some 
>>> time. It wasn't very obvious how big this weights since I wasn't working on 
>>> lucene internals for quite a while and speaking to many folks around here 
>>> this is on their shoulders but it's not brought up for discussion, i think 
>>> we have to.
>>> 
>>> simon
>>> 
>>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/
>>> 
>>> 
>>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[email protected]> 
>>> wrote:
>>> Martin:
>>> 
>>> I have no idea how logging severity levels apply to unit tests that fail. 
>>> It's not a question of triaging logs, it's a matter of Jenkins junit test 
>>> runs reporting failures.
>>> 
>>> 
>>> 
>>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[email protected]> wrote:
>>> Erick-
>>> 
>>> appears that style mis-application may be categorised as INFO
>>> are mixed in with SEVERE errors
>>> 
>>> Would it make sense to filter the errors based on severity ?
>>> 
>>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
>>> Level (Java Platform SE 7 ) - Oracle Help Center
>>> docs.oracle.com
>>> The Level class defines a set of standard logging levels that can be used 
>>> to control logging output. The logging Level objects are ordered and are 
>>> specified by ordered integers.
>>> if you know Severity you can triage the SEVERE errors before working down 
>>> to INFO errors
>>> 
>>> 
>>> WDYT?
>>> Martin
>>> ______________________________________________
>>> 
>>> 
>>> 
>>> From: Erick Erickson <[email protected]>
>>> Sent: Friday, June 15, 2018 1:05 PM
>>> To: [email protected]; Mark Miller
>>> Subject: Re: Status of solr tests
>>> 
>>> Mark (and everyone).
>>> 
>>> I'm trying to be somewhat conservative about what I BadApple, at this
>>> point it's only things that have failed every week for the last 4.
>>> Part of that conservatism is to avoid BadApple'ing tests that are
>>> failing and _should_ fail.
>>> 
>>> I'm explicitly _not_ delving into any of the causes at all at this
>>> point, it's overwhelming until we reduce the noise as everyone knows.
>>> 
>>> So please feel totally free to BadApple anything you know is flakey,
>>> it won't intrude on my turf ;)
>>> 
>>> And since I realized I can also report tests that have _not_ failed in
>>> a month that _are_ BadApple'd, we can be a little freer with
>>> BadApple'ing tests since there's a mechanism for un-annotating them
>>> without a lot of tedious effort.
>>> 
>>> FWIW.
>>> 
>>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[email protected]> wrote:
>>>> There is an okay chance I'm going to start making some improvements here as
>>>> well. I've been working on a very stable set of tests on my starburst 
>>>> branch
>>>> and will slowly bring in test fixes over time (I've already been making 
>>>> some
>>>> on that branch for important tests). We should currently be defaulting to
>>>> tests.badapples=false on all solr test runs - it's a joke to try and get a
>>>> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
>>>> commonly have so far avoided Erick's @BadApple hack and slash. They are bad
>>>> appled on my dev branch now, but that is currently where any time I have is
>>>> spent rather than on the main dev branches.
>>>> 
>>>> Also, too many flakey tests are introduced because devs are not beasting or
>>>> beasting well before committing new heavy tests. Perhaps we could add some
>>>> docs around that.
>>>> 
>>>> We have built in beasting support, we need to emphasize that a couple 
>>>> passes
>>>> on a new test is not sufficient to test it's quality.
>>>> 
>>>> - Mark
>>>> 
>>>> On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[email protected]>
>>>> wrote:
>>>>> 
>>>>> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>>>> 
>>>>> I've been trying to at least BadApple tests that fail consistently, so
>>>>> another option could be to disable BadApple'd tests. My hope has been
>>>>> to get to the point of being able to reliably get clean runs, at least
>>>>> when BadApple'd tests are disabled.
>>>>> 
>>>>> From that point I want to draw a line in the sand and immediately
>>>>> address tests that fail that are _not_ BadApple'd. At least then we'll
>>>>> stop getting _worse_. And then we can work on the BadApple'd tests.
>>>>> But as David says, that's not going to be any time soon. It's been a
>>>>> couple of months that I've been trying to just get the tests
>>>>> BadApple'd without even trying to fix any of them.
>>>>> 
>>>>> It's particularly pernicious because with all the noise we don't see
>>>>> failures we _should_ see.
>>>>> 
>>>>> So I don't have any good short-term answer either. We've built up a
>>>>> very large technical debt in the testing. The first step is to stop
>>>>> adding more debt, which is what I've been working on so far. And
>>>>> that's the easy part....
>>>>> 
>>>>> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>>>> 
>>>>> Erick
>>>>> 
>>>>> 
>>>>> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[email protected]>
>>>>> wrote:
>>>>>> (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>>>>>> Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>>>>>> are
>>>>>> trying to improve the stability of the Solr tests but even
>>>>>> optimistically
>>>>>> the practical reality is that it won't be good enough anytime soon.
>>>>>> When we
>>>>>> get there, we can reverse this.
>>>>>> 
>>>>>> On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>>>>>> <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>> folks,
>>>>>>> 
>>>>>>> I got more active working on IndexWriter and Soft-Deletes etc. in the
>>>>>>> last couple of weeks. It's a blast again and I really enjoy it. The
>>>>>>> one thing that is IMO not acceptable is the status of solr tests. I
>>>>>>> tried so many times to get them passing on several different OSs but
>>>>>>> it seems this is pretty hopepless. It's get's even worse the
>>>>>>> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>>>>>>> `-1` because of arbitrary solr tests, here is an example:
>>>>>>> 
>>>>>>> || Reason || Tests ||
>>>>>>> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>>>>>>> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>>>>>>> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>>>>>>> |   | solr.client.solrj.impl.CloudSolrClientTest |
>>>>>>> |   | solr.common.util.TestJsonRecordReader |
>>>>>>> 
>>>>>>> Speaking to other committers I hear we should just disable this job.
>>>>>>> Sorry, WTF?
>>>>>>> 
>>>>>>> These tests seem to fail all the time, randomly and over and over
>>>>>>> again. This renders the test as entirely useless to me. I even invest
>>>>>>> time (wrong, I invested) looking into it if they are caused by me or
>>>>>>> if I can do something about it. Yet, someone could call me out for
>>>>>>> being responsible for them as a commiter, yes I am hence this email. I
>>>>>>> don't think I am obliged to fix them. These projects have 50+
>>>>>>> committers and having a shared codebase doesn't mean everybody has to
>>>>>>> take care of everything. I think we are at the point where if I work
>>>>>>> on Lucene I won't run solr tests at all otherwise there won't be any
>>>>>>> progress. On the other hand solr tests never pass I wonder if the solr
>>>>>>> code-base gets changes nevertheless? That is again a terrible
>>>>>>> situation.
>>>>>>> 
>>>>>>> I spoke to varun and  anshum during buzzwords if they can give me some
>>>>>>> hints what I am doing wrong but it seems like the way it is. I feel
>>>>>>> terrible pushing stuff to our repo still seeing our tests fail. I get
>>>>>>> ~15 build failures from solr tests a day I am not the only one that
>>>>>>> has mail filters to archive them if there isn't a lucene tests in the
>>>>>>> failures.
>>>>>>> 
>>>>>>> This is a terrible state folks, how do we fix it? It's the lucene land
>>>>>>> that get much love on the testing end but that also requires more work
>>>>>>> on it, I expect solr to do the same. That at the same time requires
>>>>>>> stop pushing new stuff until the situation is under control. The
>>>>>>> effort of marking stuff as bad apples isn't the answer, this requires
>>>>>>> effort from the drivers behind this project.
>>>>>>> 
>>>>>>> simon
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>> 
>>>>>> --
>>>>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> 
>>> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | 
>>> LinkedIn
>>> linkedin.com
>>> View David Smiley’s profile on LinkedIn, the world's largest professional 
>>> community. David has 3 jobs listed on their profile. See the complete 
>>> profile on LinkedIn and discover David’s connections and jobs at similar 
>>> companies.
>>> 
>>> 
>>>>>> http://www.solrenterprisesearchserver.com
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>> 
>>>> --
>>>> - Mark
>>>> about.me/markrmiller
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>>> 
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Status of solr tests

Reply via email to