bq. a long time ago and sadly failed.

It's really a quite difficult problem. Other than cutting out tons of test
coverage, there has been no easy way to get to the top of the hill and stay
there.

I've gone after this with mostly sweat and time in the past. I still
remember one xmas day 4 or 5 years ago when I had all the apache and
policeman and my jenkins green for the main branches I tracked (like 10-14
jobs?) for the first (and only time I ended up seeing) time at once. I've
also set up my own jenkins machine and jobs at least 3 or 4 times for
months on end, with simple job percentage passing plugins and other things
that many others have done. It's all such a narrow tiny view into the real
data though. You can just try and tread water or go out and try and
physically burn down tests.

It's really not so simple as just donating time. Not unless everyone did a
lot more of it. First off, you can spend a ton of time taking many, many
tests go from failing 5 in 25 times to something like 1 in 60 or 1 in 100,
but dozens of tests failing even that often being run all the time is still
a huge problem that will grow and festure in our current situation. So how
do you even see where you are or ensure any progress made is kept? Random
changes from many other areas bring once solid tests into worse shape,
other tests accumulate new problems because no one is looking into tests
that commonly fail, etc, etc. Many of theses tests provide critical tests
coverage.

You really have to know where to focus the time. Sometimes hardening a test
is pretty quick, but often it's hours or even days of time
fighting mischievous little issues.

The only way I can see we can get anywhere that we can hold is by
generating the proper data to tell us what the situation is and to make it
very simple to track that situation over time.

It's also something a bit more authoritative than one commiters
opinion when it comes to pushing on authors to harden tests. Some tests
mainly fail on jenkins, some tests mainly fail for this guy or on that
platform, "how can you bug me about my test, I see your test fail", etc.
It's because there are just so many ways for tests to have a problem and so
many ways to run the unit tests (I run ant tests with 10 jvms and 6 cores,
others 2 and 2 and or even more). But in a fair and reasonable environment,
running a test 100 times, 10 times in parallel, is a fantastic neutral and
probably useful data point. If a test fails something 40% of the large
majority of other tests can survive then "I can't see it in my env" loses
most of it's weight. Improve or @BadApple.

I've banged my head against this wall more than once, and maybe this
amounts to about as much, but I've been thinking of this 'beasting test run
report' for many years and I really think current and reliable data is a
required step to get this right.

Beyond that, there are not too many people comfortable just hopping around
all areas of the code and hardening a broad range of tests, but I'll put in
my fair share again and see if something sparks.

If we can get to a better point, I'll be happy to help police
the appearance of new flakey tests.

Even now I will place extra focus on preventing new tests that are not
solid. The report helps me out with monitoring that by trying to use git to
get a create date for each test to display and listing the last 3 failure
percentage results (newer tests will have 0 or 1 or 2 entries).

I've spent a lot of time just getting it all automated, building my
confidence in its results, and driving down the time it takes to generate a
report so that I can do it more frequently and/or for much longer
iterations.

Over time there is a lot of little ways to improve efficiency on that front
though. For example, new tests can be run many more iterations, tests that
have failed before can be run higher iterations, etc. Linking the reports
together gives us some ammo for choosing how much time and beasting we
should spend on a test, and if a test is improving, getting worse, etc.
Data we can act on with confidence. I also have all these reports as tsv
files so they can be easily imported into just about any system for longer
or more intense cross report tracking or something.

- Mark


On Thu, Feb 9, 2017 at 3:39 PM Dawid Weiss <[email protected]> wrote:

> This is a very important, hard and ungrateful task. Thanks for doing
> this Mark. As you know I tried (with your help!) to clean some of this
> mess a long time ago and sadly failed. It'd be great to speed those
> tests up and make them more robust.
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> --
- Mark
about.me/markrmiller

Reply via email to