Re: Status of solr tests

Simon Willnauer Wed, 20 Jun 2018 00:53:36 -0700

Erik and Steve I am trying to answer both of you in one email.

Erik, I didn't want to tick you off. I appreciate you being on it and
stay on top of these failure. I am sorry if you read it that way.


Steve, regarding the "Test Confidence" role email from Cassandra. I
can appreciate the effort here but we only fight symptoms. It's like
if your mattress is so old that it gives you a headache every day and
you taking advil to fix it. These issues are a fundamental problem and
it needs an fundamental change to fix it. There must be a mindshift
towards reproducible software testing that doesn't rely on spawning
nodes up for fun and profit. This will always have that problem if you
run significantly complex tests against them.
I can take a step back and tell you we had the same issue in
elasticsearch and we couldn't cope with it anymore we spent a
significant amount of work to change our approach on testing. There
are many many features like the entire sequence ID layer that was
blocked by a untitesting framework that allows us to simulate all
kinds of networking issues. It took several months for this one
framework to be build and we didn't work on the feature until it
happened. This requires a ton of discipline an it also will cause you
to not add features unless you can actually effectively test them. We
added more than 1k to Unittest Suites in a year and it slowed us down
in the beginning. Now we are in a much better place.
I am happy to share my experience with this and why most of our
integration tests use a declarative language that essentially prevents
them from being too crazy. There are also some tests that exercise the
distributed system heavily and there are ongoing debates how much they
buy us they also fail and they are a pain in the ass. Yet, if they do
we block releases, there is a massive responsibility we have here.

I also think we can't rely on a test-triage cadence from a single
company. if we can we have a PMC issue here but I think diversity wise
we are in a good place which is great. We have to fix this to make it
work without a company sponsoring test-triage. I hope we are on the
same page here.

simon

On Tue, Jun 19, 2018 at 5:44 PM, Erick Erickson <[email protected]> wrote:
> " This is not good and bad-appeling these test isn't the answer unless
> we put a lot of effort into it, sorry I don't see it happening."
>
> This ticks me off. I've spent considerable time over the last 4 months
> trying to get to a point where we can stop getting _worse_ as a
> necessary _first_ step to getting better. Lucidworks is putting effort
> in that direction too. What other concrete actions do you recommend
> going forward?
>
> Of course just BadApple-ing tests isn't a satisfactory answer. And the
> e-mail filters I've arranged that allow me to only see failures that
> do _not_ run BadApple tests are dangerous and completely crappy.
> Unfortunately I don't have a magic wand to make it all better so this
> stop-gap (I hope) allows progress.
>
> We'll know progress is being made when the weekly BadApple reports
> show a declining number of tests that are annotated. Certainly not
> there yet, but working on it.
>
> Perhaps you missed the point of the BadApple exercise. Reasonably soon
> I hope to be at a point where we can draw a line in the sand where we
> can say "This is a new failure, fix it or roll back the changes". Then
> can we get persnickety about not adding _new_ failures. Then we can
> reduce the backlog.
>
> And the result of these efforts may be me curling into a ball and
> sucking my thumb because the problem is intractable. We'll see.
>
> One temporary-but-maybe-necessary option is to run with BadApple
> enabled. I don't like that either, but it's better than not running
> tests at all.
>
> Unfortunately when I'm working on code I have to do another crappy
> work-around; run the tests and then re-run any failing tests and
> assume if the run is successful that it was a flaky test. The BadApple
> annotations are helpful for that too since soon I hope to have
> confidence that we've annotated most all the flaky tests and if I can
> run failing tests successfully _and_ they're annotated it's probably
> OK. Horrible process, no question about that but I have to start
> somewhere.
>
> Again, what additional steps do you recommend?
>
> Erick
>
> On Tue, Jun 19, 2018 at 7:29 AM, Steve Rowe <[email protected]> wrote:
>> Hi Simon,
>>
>> Have you seen the late-February thread “Test failures are out of control….”? 
>> : 
>> https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E
>>
>> If not, I suggest you go take a look.  Some of your questions are answered 
>> there.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[email protected]> 
>>> wrote:
>>>
>>> Thanks folks, I appreciate you are sharing some thoughts about this. My 
>>> biggest issue is that this is a permanent condition. I could have sent this 
>>> mail 2, 4 or 6 years ago and it would have been as relevant as today.
>>>
>>> I am convinced mark can make some progress but this isn't fixable by a 
>>> single person this is a structural problem or rather a cultural. I am not 
>>> sure if everybody is aware of how terrible it is. I took a screenshot of my 
>>> inbox the other day what I have to dig through on a constant basis 
>>> everytime I commit a change to lucene to make sure I am not missing 
>>> something.
>>>
>>> <image.png>
>>>
>>> I don't even know how we can attract any new contributors or how many 
>>> contributors have been scared away by this in the past. This is not good 
>>> and bad-appeling these test isn't the answer unless we put a lot of effort 
>>> into it, sorry I don't see it happening. I would have expected more than 
>>> like 4 people from this PMC to reply to something like this. From my 
>>> perspective there is a lot of harm done by this to the project and we have 
>>> to figure out what we wanna do. This also affects our ability to release, 
>>> guys our smoke-test builds never pass [1]. I don't know what to do if I 
>>> were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is 
>>> serious and what not on a solr build. It's also not just be smoke tester 
>>> it's basically everything that runs after solr that is skipped on a regular 
>>> basis.
>>>
>>> I don't have a good answer but we have to get this under control it's 
>>> burdensome for lucene to carry this load and it's carrying it a quite some 
>>> time. It wasn't very obvious how big this weights since I wasn't working on 
>>> lucene internals for quite a while and speaking to many folks around here 
>>> this is on their shoulders but it's not brought up for discussion, i think 
>>> we have to.
>>>
>>> simon
>>>
>>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/
>>>
>>>
>>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[email protected]> 
>>> wrote:
>>> Martin:
>>>
>>> I have no idea how logging severity levels apply to unit tests that fail. 
>>> It's not a question of triaging logs, it's a matter of Jenkins junit test 
>>> runs reporting failures.
>>>
>>>
>>>
>>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[email protected]> wrote:
>>> Erick-
>>>
>>> appears that style mis-application may be categorised as INFO
>>> are mixed in with SEVERE errors
>>>
>>> Would it make sense to filter the errors based on severity ?
>>>
>>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
>>> Level (Java Platform SE 7 ) - Oracle Help Center
>>> docs.oracle.com
>>> The Level class defines a set of standard logging levels that can be used 
>>> to control logging output. The logging Level objects are ordered and are 
>>> specified by ordered integers.
>>> if you know Severity you can triage the SEVERE errors before working down 
>>> to INFO errors
>>>
>>>
>>> WDYT?
>>> Martin
>>> ______________________________________________
>>>
>>>
>>>
>>> From: Erick Erickson <[email protected]>
>>> Sent: Friday, June 15, 2018 1:05 PM
>>> To: [email protected]; Mark Miller
>>> Subject: Re: Status of solr tests
>>>
>>> Mark (and everyone).
>>>
>>> I'm trying to be somewhat conservative about what I BadApple, at this
>>> point it's only things that have failed every week for the last 4.
>>> Part of that conservatism is to avoid BadApple'ing tests that are
>>> failing and _should_ fail.
>>>
>>> I'm explicitly _not_ delving into any of the causes at all at this
>>> point, it's overwhelming until we reduce the noise as everyone knows.
>>>
>>> So please feel totally free to BadApple anything you know is flakey,
>>> it won't intrude on my turf ;)
>>>
>>> And since I realized I can also report tests that have _not_ failed in
>>> a month that _are_ BadApple'd, we can be a little freer with
>>> BadApple'ing tests since there's a mechanism for un-annotating them
>>> without a lot of tedious effort.
>>>
>>> FWIW.
>>>
>>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[email protected]> wrote:
>>> > There is an okay chance I'm going to start making some improvements here 
>>> > as
>>> > well. I've been working on a very stable set of tests on my starburst 
>>> > branch
>>> > and will slowly bring in test fixes over time (I've already been making 
>>> > some
>>> > on that branch for important tests). We should currently be defaulting to
>>> > tests.badapples=false on all solr test runs - it's a joke to try and get a
>>> > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
>>> > commonly have so far avoided Erick's @BadApple hack and slash. They are 
>>> > bad
>>> > appled on my dev branch now, but that is currently where any time I have 
>>> > is
>>> > spent rather than on the main dev branches.
>>> >
>>> > Also, too many flakey tests are introduced because devs are not beasting 
>>> > or
>>> > beasting well before committing new heavy tests. Perhaps we could add some
>>> > docs around that.
>>> >
>>> > We have built in beasting support, we need to emphasize that a couple 
>>> > passes
>>> > on a new test is not sufficient to test it's quality.
>>> >
>>> > - Mark
>>> >
>>> > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[email protected]>
>>> > wrote:
>>> >>
>>> >> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>> >>
>>> >> I've been trying to at least BadApple tests that fail consistently, so
>>> >> another option could be to disable BadApple'd tests. My hope has been
>>> >> to get to the point of being able to reliably get clean runs, at least
>>> >> when BadApple'd tests are disabled.
>>> >>
>>> >> From that point I want to draw a line in the sand and immediately
>>> >> address tests that fail that are _not_ BadApple'd. At least then we'll
>>> >> stop getting _worse_. And then we can work on the BadApple'd tests.
>>> >> But as David says, that's not going to be any time soon. It's been a
>>> >> couple of months that I've been trying to just get the tests
>>> >> BadApple'd without even trying to fix any of them.
>>> >>
>>> >> It's particularly pernicious because with all the noise we don't see
>>> >> failures we _should_ see.
>>> >>
>>> >> So I don't have any good short-term answer either. We've built up a
>>> >> very large technical debt in the testing. The first step is to stop
>>> >> adding more debt, which is what I've been working on so far. And
>>> >> that's the easy part....
>>> >>
>>> >> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>> >>
>>> >> Erick
>>> >>
>>> >>
>>> >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[email protected]>
>>> >> wrote:
>>> >> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>>> >> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>>> >> > are
>>> >> > trying to improve the stability of the Solr tests but even
>>> >> > optimistically
>>> >> > the practical reality is that it won't be good enough anytime soon.
>>> >> > When we
>>> >> > get there, we can reverse this.
>>> >> >
>>> >> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>>> >> > <[email protected]>
>>> >> > wrote:
>>> >> >>
>>> >> >> folks,
>>> >> >>
>>> >> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>>> >> >> last couple of weeks. It's a blast again and I really enjoy it. The
>>> >> >> one thing that is IMO not acceptable is the status of solr tests. I
>>> >> >> tried so many times to get them passing on several different OSs but
>>> >> >> it seems this is pretty hopepless. It's get's even worse the
>>> >> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>>> >> >> `-1` because of arbitrary solr tests, here is an example:
>>> >> >>
>>> >> >> || Reason || Tests ||
>>> >> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>>> >> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>>> >> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>>> >> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>>> >> >> |   | solr.common.util.TestJsonRecordReader |
>>> >> >>
>>> >> >> Speaking to other committers I hear we should just disable this job.
>>> >> >> Sorry, WTF?
>>> >> >>
>>> >> >> These tests seem to fail all the time, randomly and over and over
>>> >> >> again. This renders the test as entirely useless to me. I even invest
>>> >> >> time (wrong, I invested) looking into it if they are caused by me or
>>> >> >> if I can do something about it. Yet, someone could call me out for
>>> >> >> being responsible for them as a commiter, yes I am hence this email. I
>>> >> >> don't think I am obliged to fix them. These projects have 50+
>>> >> >> committers and having a shared codebase doesn't mean everybody has to
>>> >> >> take care of everything. I think we are at the point where if I work
>>> >> >> on Lucene I won't run solr tests at all otherwise there won't be any
>>> >> >> progress. On the other hand solr tests never pass I wonder if the solr
>>> >> >> code-base gets changes nevertheless? That is again a terrible
>>> >> >> situation.
>>> >> >>
>>> >> >> I spoke to varun and  anshum during buzzwords if they can give me some
>>> >> >> hints what I am doing wrong but it seems like the way it is. I feel
>>> >> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>>> >> >> ~15 build failures from solr tests a day I am not the only one that
>>> >> >> has mail filters to archive them if there isn't a lucene tests in the
>>> >> >> failures.
>>> >> >>
>>> >> >> This is a terrible state folks, how do we fix it? It's the lucene land
>>> >> >> that get much love on the testing end but that also requires more work
>>> >> >> on it, I expect solr to do the same. That at the same time requires
>>> >> >> stop pushing new stuff until the situation is under control. The
>>> >> >> effort of marking stuff as bad apples isn't the answer, this requires
>>> >> >> effort from the drivers behind this project.
>>> >> >>
>>> >> >> simon
>>> >> >>
>>> >> >> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: [email protected]
>>> >> >> For additional commands, e-mail: [email protected]
>>> >> >>
>>> >> > --
>>> >> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> >> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>>
>>> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | 
>>> LinkedIn
>>> linkedin.com
>>> View David Smiley’s profile on LinkedIn, the world's largest professional 
>>> community. David has 3 jobs listed on their profile. See the complete 
>>> profile on LinkedIn and discover David’s connections and jobs at similar 
>>> companies.
>>>
>>>
>>> >> > http://www.solrenterprisesearchserver.com
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: [email protected]
>>> >> For additional commands, e-mail: [email protected]
>>> >>
>>> > --
>>> > - Mark
>>> > about.me/markrmiller
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Status of solr tests

Reply via email to