Once thing that any contributor can help with is beasting their own tests before committing as well. I didn't mention that some amount of this can also be done as part of the dev process ;) There is the built in ant beast target that is super simple to use. I like to use my beasting script because of how it handles logs and results and customizations, and independent test runs, but for your average dev, a little ant beast usage could go a long way with minimal effort. Automation is always best, but some good test culture ain't a bad idea to push either.
- Mark On Thu, Feb 9, 2017 at 6:53 PM Mark Miller <[email protected]> wrote: > bq. I feel it would certainly help contributors like me to improve the > quality of patch before it is committed to trunk. There are many apache > projects which provide this type of infrastructure support e.g. when we > submit a patch in hadoop, an automated jenkins bot provides feedback about > various aspects of the patch e.g. check-style errors, unit tests failures, > javadocs etc. > > Well, at some point, if this is going to work or survive over the longer > term, it will have to evolve and automate and improve like that to some > degree. > > We will see though. One of the biggest problems is the hardware. We have > access to some Apache Jenkins machines that are already quite busy and > probably not the greatest candidates for that job (if it was even possible > to steal one from the jenkins cluster). Meanwhile, last I knew Apache does > not allow companies to donate hardware to Apache for a specific purpose or > project. So hardware for full automation is one issue. Beasting a test well > can be done in 5-30 in most cases on decent hardware with enough RAM (tests > potentially need 512MB for heap alone each, so a decent amount of RAM or > fast swap is required for lots to go in parallel. More in parallel tends to > produce fails faster (though too many will obviously overload the hardware > and not be very useful either). > > Back when I was talking about a full beasting test run to Greg Chanan a > couple years back, my main thought was we don't have the hardware for it. > Some company could do it, but how do you actually integrate into > our automated process that way over the long term? Companies come and go, > interest and money comes and goes, etc. So what is the sustainable plan? > I'm not sure, but I think driving in the right direction might present some > kind of new possibilities. > > When I brought up the unlikeliness of the build system or reporting or > hardware showing up in a way that works for the project anytime soon, Greg > brought up the same type of idea for a post commit job type thing that > could aluto track and beast new tests (and perhaps altered tests) > automatically and then post the results to JIRA. I still think that is a > great idea if we could figure out a hardware side. > > For a while I'm just going to produce reports in the cloud on GCE** and > provide the initial hand holding and support this needs to try and get off > the ground and reliable. If it get's any steam or helps produce any > results, perhaps others will have further ideas on how to automate, find > available hardware/resource and improve the strategy. > > Once something is fully automated, beasting all the tests *could* > theoretically become a much less common exercise. You could have pre or > post commit hooks to look at tests in patches or recent commits, you could > have a job that randomly selects tests to beast every day or other day, > some of those beastings could be for a lot of runs, and of course you could > provide extra coverage to tests that have a poor history. > > Anyhow, I'll fake a bit of that for a while once I get a little more > ramped up. But yeah, eventually more automation and strategy will be > important. > > - Mark > > > ** Thank you Cloudera! That last report was probably $24 excluding my > labor with 10 machine - the first report on a single machine cost much more. > > > > On Thu, Feb 9, 2017 at 6:02 PM Hrishikesh Gadre <[email protected]> > wrote: > > Hi Mark, > > Thanks for taking care of this. > > >>preventing new tests that are not solid. > > What are your thoughts on this? While keeping track of recently introduced > and flaky tests is one way to go forward, would it make sense to have some > sort of automated test run *before* committing the changes ? I feel it > would certainly help contributors like me to improve the quality of patch > before it is committed to trunk. There are many apache projects which > provide this type of infrastructure support e.g. when we submit a patch in > hadoop, an automated jenkins bot provides feedback about various aspects of > the patch e.g. check-style errors, unit tests failures, javadocs etc. > > Thoughts? > > Thanks > -Hrishikesh > > > > > On Thu, Feb 9, 2017 at 2:45 PM, Mark Miller <[email protected]> wrote: > > bq. a long time ago and sadly failed. > > It's really a quite difficult problem. Other than cutting out tons of test > coverage, there has been no easy way to get to the top of the hill and stay > there. > > I've gone after this with mostly sweat and time in the past. I still > remember one xmas day 4 or 5 years ago when I had all the apache and > policeman and my jenkins green for the main branches I tracked (like 10-14 > jobs?) for the first (and only time I ended up seeing) time at once. I've > also set up my own jenkins machine and jobs at least 3 or 4 times for > months on end, with simple job percentage passing plugins and other things > that many others have done. It's all such a narrow tiny view into the real > data though. You can just try and tread water or go out and try and > physically burn down tests. > > It's really not so simple as just donating time. Not unless everyone did a > lot more of it. First off, you can spend a ton of time taking many, many > tests go from failing 5 in 25 times to something like 1 in 60 or 1 in 100, > but dozens of tests failing even that often being run all the time is still > a huge problem that will grow and festure in our current situation. So how > do you even see where you are or ensure any progress made is kept? Random > changes from many other areas bring once solid tests into worse shape, > other tests accumulate new problems because no one is looking into tests > that commonly fail, etc, etc. Many of theses tests provide critical tests > coverage. > > You really have to know where to focus the time. Sometimes hardening a > test is pretty quick, but often it's hours or even days of time > fighting mischievous little issues. > > The only way I can see we can get anywhere that we can hold is by > generating the proper data to tell us what the situation is and to make it > very simple to track that situation over time. > > It's also something a bit more authoritative than one commiters > opinion when it comes to pushing on authors to harden tests. Some tests > mainly fail on jenkins, some tests mainly fail for this guy or on that > platform, "how can you bug me about my test, I see your test fail", etc. > It's because there are just so many ways for tests to have a problem and so > many ways to run the unit tests (I run ant tests with 10 jvms and 6 cores, > others 2 and 2 and or even more). But in a fair and reasonable environment, > running a test 100 times, 10 times in parallel, is a fantastic neutral and > probably useful data point. If a test fails something 40% of the large > majority of other tests can survive then "I can't see it in my env" loses > most of it's weight. Improve or @BadApple. > > I've banged my head against this wall more than once, and maybe this > amounts to about as much, but I've been thinking of this 'beasting test run > report' for many years and I really think current and reliable data is a > required step to get this right. > > Beyond that, there are not too many people comfortable just hopping around > all areas of the code and hardening a broad range of tests, but I'll put in > my fair share again and see if something sparks. > > If we can get to a better point, I'll be happy to help police > the appearance of new flakey tests. > > Even now I will place extra focus on preventing new tests that are not > solid. The report helps me out with monitoring that by trying to use git to > get a create date for each test to display and listing the last 3 failure > percentage results (newer tests will have 0 or 1 or 2 entries). > > I've spent a lot of time just getting it all automated, building my > confidence in its results, and driving down the time it takes to generate a > report so that I can do it more frequently and/or for much longer > iterations. > > Over time there is a lot of little ways to improve efficiency on that > front though. For example, new tests can be run many more iterations, tests > that have failed before can be run higher iterations, etc. Linking the > reports together gives us some ammo for choosing how much time and beasting > we should spend on a test, and if a test is improving, getting worse, etc. > Data we can act on with confidence. I also have all these reports as tsv > files so they can be easily imported into just about any system for longer > or more intense cross report tracking or something. > > - Mark > > > On Thu, Feb 9, 2017 at 3:39 PM Dawid Weiss <[email protected]> wrote: > > This is a very important, hard and ungrateful task. Thanks for doing > this Mark. As you know I tried (with your help!) to clean some of this > mess a long time ago and sadly failed. It'd be great to speed those > tests up and make them more robust. > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- > - Mark > about.me/markrmiller > > > -- > - Mark > about.me/markrmiller > -- - Mark about.me/markrmiller
