Hi Mark, Thanks for taking care of this.
>>preventing new tests that are not solid. What are your thoughts on this? While keeping track of recently introduced and flaky tests is one way to go forward, would it make sense to have some sort of automated test run *before* committing the changes ? I feel it would certainly help contributors like me to improve the quality of patch before it is committed to trunk. There are many apache projects which provide this type of infrastructure support e.g. when we submit a patch in hadoop, an automated jenkins bot provides feedback about various aspects of the patch e.g. check-style errors, unit tests failures, javadocs etc. Thoughts? Thanks -Hrishikesh On Thu, Feb 9, 2017 at 2:45 PM, Mark Miller <[email protected]> wrote: > bq. a long time ago and sadly failed. > > It's really a quite difficult problem. Other than cutting out tons of test > coverage, there has been no easy way to get to the top of the hill and stay > there. > > I've gone after this with mostly sweat and time in the past. I still > remember one xmas day 4 or 5 years ago when I had all the apache and > policeman and my jenkins green for the main branches I tracked (like 10-14 > jobs?) for the first (and only time I ended up seeing) time at once. I've > also set up my own jenkins machine and jobs at least 3 or 4 times for > months on end, with simple job percentage passing plugins and other things > that many others have done. It's all such a narrow tiny view into the real > data though. You can just try and tread water or go out and try and > physically burn down tests. > > It's really not so simple as just donating time. Not unless everyone did a > lot more of it. First off, you can spend a ton of time taking many, many > tests go from failing 5 in 25 times to something like 1 in 60 or 1 in 100, > but dozens of tests failing even that often being run all the time is still > a huge problem that will grow and festure in our current situation. So how > do you even see where you are or ensure any progress made is kept? Random > changes from many other areas bring once solid tests into worse shape, > other tests accumulate new problems because no one is looking into tests > that commonly fail, etc, etc. Many of theses tests provide critical tests > coverage. > > You really have to know where to focus the time. Sometimes hardening a > test is pretty quick, but often it's hours or even days of time > fighting mischievous little issues. > > The only way I can see we can get anywhere that we can hold is by > generating the proper data to tell us what the situation is and to make it > very simple to track that situation over time. > > It's also something a bit more authoritative than one commiters > opinion when it comes to pushing on authors to harden tests. Some tests > mainly fail on jenkins, some tests mainly fail for this guy or on that > platform, "how can you bug me about my test, I see your test fail", etc. > It's because there are just so many ways for tests to have a problem and so > many ways to run the unit tests (I run ant tests with 10 jvms and 6 cores, > others 2 and 2 and or even more). But in a fair and reasonable environment, > running a test 100 times, 10 times in parallel, is a fantastic neutral and > probably useful data point. If a test fails something 40% of the large > majority of other tests can survive then "I can't see it in my env" loses > most of it's weight. Improve or @BadApple. > > I've banged my head against this wall more than once, and maybe this > amounts to about as much, but I've been thinking of this 'beasting test run > report' for many years and I really think current and reliable data is a > required step to get this right. > > Beyond that, there are not too many people comfortable just hopping around > all areas of the code and hardening a broad range of tests, but I'll put in > my fair share again and see if something sparks. > > If we can get to a better point, I'll be happy to help police > the appearance of new flakey tests. > > Even now I will place extra focus on preventing new tests that are not > solid. The report helps me out with monitoring that by trying to use git to > get a create date for each test to display and listing the last 3 failure > percentage results (newer tests will have 0 or 1 or 2 entries). > > I've spent a lot of time just getting it all automated, building my > confidence in its results, and driving down the time it takes to generate a > report so that I can do it more frequently and/or for much longer > iterations. > > Over time there is a lot of little ways to improve efficiency on that > front though. For example, new tests can be run many more iterations, tests > that have failed before can be run higher iterations, etc. Linking the > reports together gives us some ammo for choosing how much time and beasting > we should spend on a test, and if a test is improving, getting worse, etc. > Data we can act on with confidence. I also have all these reports as tsv > files so they can be easily imported into just about any system for longer > or more intense cross report tracking or something. > > - Mark > > > On Thu, Feb 9, 2017 at 3:39 PM Dawid Weiss <[email protected]> wrote: > >> This is a very important, hard and ungrateful task. Thanks for doing >> this Mark. As you know I tried (with your help!) to clean some of this >> mess a long time ago and sadly failed. It'd be great to speed those >> tests up and make them more robust. >> >> Dawid >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> -- > - Mark > about.me/markrmiller >
