Re: The Wild World of Solr Tests

Mark Miller Thu, 09 Feb 2017 15:58:35 -0800

Once thing that any contributor can help with is beasting their own tests
before committing as well. I didn't mention that some amount of this can
also be done as part of the dev process ;) There is the built in ant beast
target that is super simple to use. I like to use my beasting script
because of how it handles logs and results and customizations, and
independent test runs, but for your average dev, a little ant beast usage
could go a long way with minimal effort. Automation is always best, but
some good test culture ain't a bad idea to push either.


- Mark

On Thu, Feb 9, 2017 at 6:53 PM Mark Miller <[email protected]> wrote:

> bq. I feel it would certainly help contributors like me to improve the
> quality of patch before it is committed to trunk. There are many apache
> projects which provide this type of infrastructure support e.g. when we
> submit a patch in hadoop, an automated jenkins bot provides feedback about
> various aspects of the patch e.g. check-style errors, unit tests failures,
> javadocs etc.
>
> Well, at some point, if this is going to work or survive over the longer
> term, it will have to evolve and automate and improve like that to some
> degree.
>
> We will see though. One of the biggest problems is the hardware. We have
> access to some Apache Jenkins machines that are already quite busy and
> probably not the greatest candidates for that job (if it was even possible
> to steal one from the jenkins cluster). Meanwhile, last I knew Apache does
> not allow companies to donate hardware to Apache for a specific purpose or
> project. So hardware for full automation is one issue. Beasting a test well
> can be done in 5-30 in most cases on decent hardware with enough RAM (tests
> potentially need 512MB for heap alone each, so a decent amount of RAM or
> fast swap is required for lots to go in parallel. More in parallel tends to
> produce fails faster (though too many will obviously overload the hardware
> and not be very useful either).
>
> Back when I was talking about a full beasting test run to Greg Chanan a
> couple years back, my main thought was we don't have the hardware for it.
> Some company could do it, but how do you actually integrate into
> our automated process that way over the long term? Companies come and go,
> interest and money comes and goes, etc. So what is the sustainable plan?
> I'm not sure, but I think driving in the right direction might present some
> kind of new possibilities.
>
> When I brought up the unlikeliness of the build system or reporting or
> hardware showing up in a way that works for the project anytime soon, Greg
> brought up the same type of idea for a post commit job type thing that
> could aluto track and beast new tests (and perhaps altered tests)
> automatically and then post the results to JIRA. I still think that is a
> great idea if we could figure out a hardware side.
>
> For a while I'm just going to produce reports in the cloud on GCE** and
> provide the initial hand holding and support this needs to try and get off
> the ground and reliable. If it get's any steam or helps produce any
> results, perhaps others will have further ideas on how to automate, find
> available hardware/resource and improve the strategy.
>
> Once something is fully automated, beasting all the tests *could*
> theoretically  become a much less common exercise. You could have pre or
> post commit hooks to look at tests in patches or recent commits, you could
> have a job that randomly selects tests to beast every day or other day,
> some of those beastings could be for a lot of runs, and of course you could
> provide extra coverage to tests that have a poor history.
>
> Anyhow, I'll fake a bit of that for a while once I get a little more
> ramped up. But yeah, eventually more automation and strategy will be
> important.
>
> - Mark
>
>
> ** Thank you Cloudera! That last report was probably $24 excluding my
> labor with 10 machine - the first report on a single machine cost much more.
>
>
>
> On Thu, Feb 9, 2017 at 6:02 PM Hrishikesh Gadre <[email protected]>
> wrote:
>
> Hi Mark,
>
> Thanks for taking care of this.
>
> >>preventing new tests that are not solid.
>
> What are your thoughts on this? While keeping track of recently introduced
> and flaky tests is one way to go forward, would it make sense to have some
> sort of automated test run *before* committing the changes ? I feel it
> would certainly help contributors like me to improve the quality of patch
> before it is committed to trunk. There are many apache projects which
> provide this type of infrastructure support e.g. when we submit a patch in
> hadoop, an automated jenkins bot provides feedback about various aspects of
> the patch e.g. check-style errors, unit tests failures, javadocs etc.
>
> Thoughts?
>
> Thanks
> -Hrishikesh
>
>
>
>
> On Thu, Feb 9, 2017 at 2:45 PM, Mark Miller <[email protected]> wrote:
>
> bq. a long time ago and sadly failed.
>
> It's really a quite difficult problem. Other than cutting out tons of test
> coverage, there has been no easy way to get to the top of the hill and stay
> there.
>
> I've gone after this with mostly sweat and time in the past. I still
> remember one xmas day 4 or 5 years ago when I had all the apache and
> policeman and my jenkins green for the main branches I tracked (like 10-14
> jobs?) for the first (and only time I ended up seeing) time at once. I've
> also set up my own jenkins machine and jobs at least 3 or 4 times for
> months on end, with simple job percentage passing plugins and other things
> that many others have done. It's all such a narrow tiny view into the real
> data though. You can just try and tread water or go out and try and
> physically burn down tests.
>
> It's really not so simple as just donating time. Not unless everyone did a
> lot more of it. First off, you can spend a ton of time taking many, many
> tests go from failing 5 in 25 times to something like 1 in 60 or 1 in 100,
> but dozens of tests failing even that often being run all the time is still
> a huge problem that will grow and festure in our current situation. So how
> do you even see where you are or ensure any progress made is kept? Random
> changes from many other areas bring once solid tests into worse shape,
> other tests accumulate new problems because no one is looking into tests
> that commonly fail, etc, etc. Many of theses tests provide critical tests
> coverage.
>
> You really have to know where to focus the time. Sometimes hardening a
> test is pretty quick, but often it's hours or even days of time
> fighting mischievous little issues.
>
> The only way I can see we can get anywhere that we can hold is by
> generating the proper data to tell us what the situation is and to make it
> very simple to track that situation over time.
>
> It's also something a bit more authoritative than one commiters
> opinion when it comes to pushing on authors to harden tests. Some tests
> mainly fail on jenkins, some tests mainly fail for this guy or on that
> platform, "how can you bug me about my test, I see your test fail", etc.
> It's because there are just so many ways for tests to have a problem and so
> many ways to run the unit tests (I run ant tests with 10 jvms and 6 cores,
> others 2 and 2 and or even more). But in a fair and reasonable environment,
> running a test 100 times, 10 times in parallel, is a fantastic neutral and
> probably useful data point. If a test fails something 40% of the large
> majority of other tests can survive then "I can't see it in my env" loses
> most of it's weight. Improve or @BadApple.
>
> I've banged my head against this wall more than once, and maybe this
> amounts to about as much, but I've been thinking of this 'beasting test run
> report' for many years and I really think current and reliable data is a
> required step to get this right.
>
> Beyond that, there are not too many people comfortable just hopping around
> all areas of the code and hardening a broad range of tests, but I'll put in
> my fair share again and see if something sparks.
>
> If we can get to a better point, I'll be happy to help police
> the appearance of new flakey tests.
>
> Even now I will place extra focus on preventing new tests that are not
> solid. The report helps me out with monitoring that by trying to use git to
> get a create date for each test to display and listing the last 3 failure
> percentage results (newer tests will have 0 or 1 or 2 entries).
>
> I've spent a lot of time just getting it all automated, building my
> confidence in its results, and driving down the time it takes to generate a
> report so that I can do it more frequently and/or for much longer
> iterations.
>
> Over time there is a lot of little ways to improve efficiency on that
> front though. For example, new tests can be run many more iterations, tests
> that have failed before can be run higher iterations, etc. Linking the
> reports together gives us some ammo for choosing how much time and beasting
> we should spend on a test, and if a test is improving, getting worse, etc.
> Data we can act on with confidence. I also have all these reports as tsv
> files so they can be easily imported into just about any system for longer
> or more intense cross report tracking or something.
>
> - Mark
>
>
> On Thu, Feb 9, 2017 at 3:39 PM Dawid Weiss <[email protected]> wrote:
>
> This is a very important, hard and ungrateful task. Thanks for doing
> this Mark. As you know I tried (with your help!) to clean some of this
> mess a long time ago and sadly failed. It'd be great to speed those
> tests up and make them more robust.
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> --
> - Mark
> about.me/markrmiller
>
>
> --
> - Mark
> about.me/markrmiller
>
-- 
- Mark
about.me/markrmiller

Re: The Wild World of Solr Tests

Reply via email to