Re: Status of solr tests

Erick Erickson Tue, 19 Jun 2018 08:45:11 -0700

" This is not good and bad-appeling these test isn't the answer unless
we put a lot of effort into it, sorry I don't see it happening."


This ticks me off. I've spent considerable time over the last 4 months
trying to get to a point where we can stop getting _worse_ as a
necessary _first_ step to getting better. Lucidworks is putting effort
in that direction too. What other concrete actions do you recommend
going forward?

Of course just BadApple-ing tests isn't a satisfactory answer. And the
e-mail filters I've arranged that allow me to only see failures that
do _not_ run BadApple tests are dangerous and completely crappy.
Unfortunately I don't have a magic wand to make it all better so this
stop-gap (I hope) allows progress.

We'll know progress is being made when the weekly BadApple reports
show a declining number of tests that are annotated. Certainly not
there yet, but working on it.

Perhaps you missed the point of the BadApple exercise. Reasonably soon
I hope to be at a point where we can draw a line in the sand where we
can say "This is a new failure, fix it or roll back the changes". Then
can we get persnickety about not adding _new_ failures. Then we can
reduce the backlog.

And the result of these efforts may be me curling into a ball and
sucking my thumb because the problem is intractable. We'll see.

One temporary-but-maybe-necessary option is to run with BadApple
enabled. I don't like that either, but it's better than not running
tests at all.

Unfortunately when I'm working on code I have to do another crappy
work-around; run the tests and then re-run any failing tests and
assume if the run is successful that it was a flaky test. The BadApple
annotations are helpful for that too since soon I hope to have
confidence that we've annotated most all the flaky tests and if I can
run failing tests successfully _and_ they're annotated it's probably
OK. Horrible process, no question about that but I have to start
somewhere.

Again, what additional steps do you recommend?

Erick

On Tue, Jun 19, 2018 at 7:29 AM, Steve Rowe <[email protected]> wrote:
> Hi Simon,
>
> Have you seen the late-February thread “Test failures are out of control….”? 
> : 
> https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E
>
> If not, I suggest you go take a look.  Some of your questions are answered 
> there.
>
> --
> Steve
> www.lucidworks.com
>
>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[email protected]> 
>> wrote:
>>
>> Thanks folks, I appreciate you are sharing some thoughts about this. My 
>> biggest issue is that this is a permanent condition. I could have sent this 
>> mail 2, 4 or 6 years ago and it would have been as relevant as today.
>>
>> I am convinced mark can make some progress but this isn't fixable by a 
>> single person this is a structural problem or rather a cultural. I am not 
>> sure if everybody is aware of how terrible it is. I took a screenshot of my 
>> inbox the other day what I have to dig through on a constant basis everytime 
>> I commit a change to lucene to make sure I am not missing something.
>>
>> <image.png>
>>
>> I don't even know how we can attract any new contributors or how many 
>> contributors have been scared away by this in the past. This is not good and 
>> bad-appeling these test isn't the answer unless we put a lot of effort into 
>> it, sorry I don't see it happening. I would have expected more than like 4 
>> people from this PMC to reply to something like this. From my perspective 
>> there is a lot of harm done by this to the project and we have to figure out 
>> what we wanna do. This also affects our ability to release, guys our 
>> smoke-test builds never pass [1]. I don't know what to do if I were a RM for 
>> 7.4 (thanks adrien for doing it) Like I can not tell what is serious and 
>> what not on a solr build. It's also not just be smoke tester it's basically 
>> everything that runs after solr that is skipped on a regular basis.
>>
>> I don't have a good answer but we have to get this under control it's 
>> burdensome for lucene to carry this load and it's carrying it a quite some 
>> time. It wasn't very obvious how big this weights since I wasn't working on 
>> lucene internals for quite a while and speaking to many folks around here 
>> this is on their shoulders but it's not brought up for discussion, i think 
>> we have to.
>>
>> simon
>>
>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/
>>
>>
>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[email protected]> 
>> wrote:
>> Martin:
>>
>> I have no idea how logging severity levels apply to unit tests that fail. 
>> It's not a question of triaging logs, it's a matter of Jenkins junit test 
>> runs reporting failures.
>>
>>
>>
>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[email protected]> wrote:
>> Erick-
>>
>> appears that style mis-application may be categorised as INFO
>> are mixed in with SEVERE errors
>>
>> Would it make sense to filter the errors based on severity ?
>>
>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
>> Level (Java Platform SE 7 ) - Oracle Help Center
>> docs.oracle.com
>> The Level class defines a set of standard logging levels that can be used to 
>> control logging output. The logging Level objects are ordered and are 
>> specified by ordered integers.
>> if you know Severity you can triage the SEVERE errors before working down to 
>> INFO errors
>>
>>
>> WDYT?
>> Martin
>> ______________________________________________
>>
>>
>>
>> From: Erick Erickson <[email protected]>
>> Sent: Friday, June 15, 2018 1:05 PM
>> To: [email protected]; Mark Miller
>> Subject: Re: Status of solr tests
>>
>> Mark (and everyone).
>>
>> I'm trying to be somewhat conservative about what I BadApple, at this
>> point it's only things that have failed every week for the last 4.
>> Part of that conservatism is to avoid BadApple'ing tests that are
>> failing and _should_ fail.
>>
>> I'm explicitly _not_ delving into any of the causes at all at this
>> point, it's overwhelming until we reduce the noise as everyone knows.
>>
>> So please feel totally free to BadApple anything you know is flakey,
>> it won't intrude on my turf ;)
>>
>> And since I realized I can also report tests that have _not_ failed in
>> a month that _are_ BadApple'd, we can be a little freer with
>> BadApple'ing tests since there's a mechanism for un-annotating them
>> without a lot of tedious effort.
>>
>> FWIW.
>>
>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[email protected]> wrote:
>> > There is an okay chance I'm going to start making some improvements here as
>> > well. I've been working on a very stable set of tests on my starburst 
>> > branch
>> > and will slowly bring in test fixes over time (I've already been making 
>> > some
>> > on that branch for important tests). We should currently be defaulting to
>> > tests.badapples=false on all solr test runs - it's a joke to try and get a
>> > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
>> > commonly have so far avoided Erick's @BadApple hack and slash. They are bad
>> > appled on my dev branch now, but that is currently where any time I have is
>> > spent rather than on the main dev branches.
>> >
>> > Also, too many flakey tests are introduced because devs are not beasting or
>> > beasting well before committing new heavy tests. Perhaps we could add some
>> > docs around that.
>> >
>> > We have built in beasting support, we need to emphasize that a couple 
>> > passes
>> > on a new test is not sufficient to test it's quality.
>> >
>> > - Mark
>> >
>> > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[email protected]>
>> > wrote:
>> >>
>> >> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>> >>
>> >> I've been trying to at least BadApple tests that fail consistently, so
>> >> another option could be to disable BadApple'd tests. My hope has been
>> >> to get to the point of being able to reliably get clean runs, at least
>> >> when BadApple'd tests are disabled.
>> >>
>> >> From that point I want to draw a line in the sand and immediately
>> >> address tests that fail that are _not_ BadApple'd. At least then we'll
>> >> stop getting _worse_. And then we can work on the BadApple'd tests.
>> >> But as David says, that's not going to be any time soon. It's been a
>> >> couple of months that I've been trying to just get the tests
>> >> BadApple'd without even trying to fix any of them.
>> >>
>> >> It's particularly pernicious because with all the noise we don't see
>> >> failures we _should_ see.
>> >>
>> >> So I don't have any good short-term answer either. We've built up a
>> >> very large technical debt in the testing. The first step is to stop
>> >> adding more debt, which is what I've been working on so far. And
>> >> that's the easy part....
>> >>
>> >> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>> >>
>> >> Erick
>> >>
>> >>
>> >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[email protected]>
>> >> wrote:
>> >> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>> >> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>> >> > are
>> >> > trying to improve the stability of the Solr tests but even
>> >> > optimistically
>> >> > the practical reality is that it won't be good enough anytime soon.
>> >> > When we
>> >> > get there, we can reverse this.
>> >> >
>> >> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>> >> > <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> folks,
>> >> >>
>> >> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> >> >> last couple of weeks. It's a blast again and I really enjoy it. The
>> >> >> one thing that is IMO not acceptable is the status of solr tests. I
>> >> >> tried so many times to get them passing on several different OSs but
>> >> >> it seems this is pretty hopepless. It's get's even worse the
>> >> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> >> >> `-1` because of arbitrary solr tests, here is an example:
>> >> >>
>> >> >> || Reason || Tests ||
>> >> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> >> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> >> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> >> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> >> >> |   | solr.common.util.TestJsonRecordReader |
>> >> >>
>> >> >> Speaking to other committers I hear we should just disable this job.
>> >> >> Sorry, WTF?
>> >> >>
>> >> >> These tests seem to fail all the time, randomly and over and over
>> >> >> again. This renders the test as entirely useless to me. I even invest
>> >> >> time (wrong, I invested) looking into it if they are caused by me or
>> >> >> if I can do something about it. Yet, someone could call me out for
>> >> >> being responsible for them as a commiter, yes I am hence this email. I
>> >> >> don't think I am obliged to fix them. These projects have 50+
>> >> >> committers and having a shared codebase doesn't mean everybody has to
>> >> >> take care of everything. I think we are at the point where if I work
>> >> >> on Lucene I won't run solr tests at all otherwise there won't be any
>> >> >> progress. On the other hand solr tests never pass I wonder if the solr
>> >> >> code-base gets changes nevertheless? That is again a terrible
>> >> >> situation.
>> >> >>
>> >> >> I spoke to varun and  anshum during buzzwords if they can give me some
>> >> >> hints what I am doing wrong but it seems like the way it is. I feel
>> >> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>> >> >> ~15 build failures from solr tests a day I am not the only one that
>> >> >> has mail filters to archive them if there isn't a lucene tests in the
>> >> >> failures.
>> >> >>
>> >> >> This is a terrible state folks, how do we fix it? It's the lucene land
>> >> >> that get much love on the testing end but that also requires more work
>> >> >> on it, I expect solr to do the same. That at the same time requires
>> >> >> stop pushing new stuff until the situation is under control. The
>> >> >> effort of marking stuff as bad apples isn't the answer, this requires
>> >> >> effort from the drivers behind this project.
>> >> >>
>> >> >> simon
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: [email protected]
>> >> >> For additional commands, e-mail: [email protected]
>> >> >>
>> >> > --
>> >> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> >> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>
>> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | 
>> LinkedIn
>> linkedin.com
>> View David Smiley’s profile on LinkedIn, the world's largest professional 
>> community. David has 3 jobs listed on their profile. See the complete 
>> profile on LinkedIn and discover David’s connections and jobs at similar 
>> companies.
>>
>>
>> >> > http://www.solrenterprisesearchserver.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> > --
>> > - Mark
>> > about.me/markrmiller
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Status of solr tests

Reply via email to