Hi Pieter, On Thu, Sep 20, 2018 at 12:48:35AM +0200, PiBa-NL wrote: > Test takes like 5 seconds to run here, and while that is a bit long if you > get a hundred more similar tests and want to continue tweaking developments > while running tests in between. It wouldn't hurt to run such a (series) of > longer tests before creating a patch and submitting it for inclusion on the > official git repository in my opinion, or before a release.
Definitely, however those run before a release should be almost 100% system-agnostic. Having to prepare the system and tune it properly for the test not to fail is going to cause a headache every time there is a failure because it will mean fixing the root cause and re-run the whole suite, which precisely is what will make all this suite not to be used at all anymore. This is why I'm really picky on the reliability and the speed of these tests. It should be stupid-proof, with me being the stupid (and believe me when it comes to doing repetitive stuff, I'm among the stupidest persons you have ever met). > My attempt was > to test a bit differently than just looking for regressions of known fixed > bugs, and putting a little load on haproxy so that threads and simultaneous > actions 'might' get into conflicts/locks/stuff which might by chance, show > up, which is why i choose to go a little higher on the number of round-trips > with ever slightly increasing payload.. I really understand the point and I think it is valid to a certain extent. But that's really not a thing to run by default. And I want to encourage us (including me) to run reg tests from time to time. If you know that some of them will take too long, you'll quickly end up avoiding all the ones you can easily avoid using a single command (eg: playing with the LEVEL variable, or not running it at all). > For me the test produces like 345 lines of output as attached. which seems > not to bad (if the test succeeds). It's already far too much for a user. I should only know if it works or not, otherwise it hides the output of all other ones (which is what happened). We must not have to parse the output to know if we didn't break anything, we just have to check that it looks normal. Here's what make reg-tests gives me on 1.8 : willy@wtap:haproxy$ time sh make-reg-tests-1.8 # top TEST reg-tests/lua/b00002.vtc passed (0.159) # top TEST reg-tests/lua/b00001.vtc passed (0.122) # top TEST reg-tests/lua/b00000.vtc passed (0.110) # top TEST reg-tests/lua/b00003.vtc passed (0.137) # top TEST reg-tests/connection/b00000.vtc passed (0.172) # top TEST reg-tests/server/b00000.vtc passed (0.110) # top TEST reg-tests/spoe/b00000.vtc passed (0.008) # top TEST reg-tests/ssl/b00000.vtc passed (0.139) # top TEST reg-tests/stick-table/b00001.vtc passed (0.110) # top TEST reg-tests/stick-table/b00000.vtc passed (0.110) # top TEST reg-tests/log/b00000.vtc passed (0.125) # top TEST reg-tests/seamless-reload/b00000.vtc passed (0.123) real 0m1.713s user 0m0.316s sys 0m0.068s As you can see there's no output to parse, it *visually* looks correct. > Besides the 2 instances of cli output > for stats, its seems not that much different from other tests.. > And with 1.8.13 on FreeBSD (without qkueue) it succeeds: # top TEST > ./test/b00000-loadtest.vtc passed (4.800 OK then you get a valid output there. It's here that it's ugly. But we spend enough time analysing bugs, I really refuse to spend extra time fixing bugs in tests supposed to help detect bugs, otherwise it becomes recursive... > Taking into account conntrack and ulimit, would that mean we can never > 'reg-test' if haproxy can really handle like 10000 connections without > issue? 10k conns definitely is way beyond what you can expect from a non-root user on a regular shell. I run most of my configs with "maxconn 400" because that's less than 1024 FDs once you add the extra FDs for listeners and checks. Anything beyond that will depend on the users' setup and becomes tricky. And in this case it's more a matter of stress-testing the system, and we can have stress-test procedures or tools (just like we all do on our respective setups with different tools). It's just that one has to know in advance that some preparation is needed (typically killing a possible browser, unloading some modules, checking that there's still memory left, maybe adding some addresses to the loopback, etc). So it's a different approach and it definitely is out of the scope of automatizing detection of potential regressions during development. > Or should the environment be configured by the test?? ,that seems > very tricky at least and probably would be undesirable.. No definitely it would be even worse. For sure those used to run "curl | sudo bash" will have no problem letting it configure their system, but I'm not among such people and I appreciate a lot that my machine works every morning when I want to work :-) > > $ make reg-tests/heavy/conn-counter-3000-req.log > I'm not exactly sure..("make: don't know how to make reg-tests. Stop"). i > would still like to have a way to run all 'applicable' tests with 1 command, > even if it takes a hour or so to verify haproxy is working 'perfectly'. One thing is important to keep in mind regarding automated testing : the tests are *only* useful if they take less cumulated time to detect a bug than the time it would have taken to detect it oneself. I mean (I'm exagerating a bit but you'll get it), if the tool takes 1 minute per build, 100 builds a day, thus roughly 25000 minutes per year, that's roughly 52 work days at 8h/day. I definitely claim that a human will not waste 52 full days a year to detect a bug, not even to fix it. So there is a reasonable tradeoff to set. That's also why I'm saying that I'm *not* interested in tests for already fixed bugs, they only waste test time. Their only purpose is for backports, because like any reproducer, it helps the stable team to verify that 1) the bug is indeed present in the stable version and needs a fix, and 2) that the backport was properly done. But once done, this test becomes useless and for now I don't have a good solution to propose to keep them without having to re-run them. I suspect that we could use sequence numbers for such tests, or maybe just dates, and have a file somewhere that we update from time to time, which contains the earliest version that we're going to test in that category (i.e. "run tests for all bugs fixed since 20170101"). It would only require a single commit in each maintenance release to bump that file and say "OK, no need to test these ones anymore, they are fixed". My interest is in large coverage, functional coverage. We can have config files making use of 10-15% of the known features at once, and which will fail if any of such features get broken, but will never ever fail otherwise. This is useful testing. But it's not as easy to implement as it seems, because once you factor in the maintenance aspect, you'll realise that sometimes you have to update the test file to adjust something related to a minor behaviour change and that it doesn't backport as easily. But that's where most of the value lies in my opinion. > But > like abns@ tests cant work on FreeBSD, but they should not 'fail', perhaps > get skipped automatically though.?. Very likely. In fact given that we want *functional* coverage, this means that either the test is specific to abns and should be skipped on FreeBSD, or it's generic and makes use of abns becaus eit was convenient there, and it has to be modified to be portable. > Anyhow thats a question for my other > mail-topic ( https://email@example.com/msg31195.html > ) Thanks for the link, I think I missed this one. > > Thus for now I'm not applying your patch, but I'm interested in seeing > > what can be done with it. > Okay no problem :) , ill keep running this particular test myself for the > moment, it 'should' be able to pass normally.. (On my environment anyhow..) (...) > I'm interested in Fred's and anyone elses opinion ;) , and well maybe this > particular test-case could be replaced by something simpler/faster/ with > more or less the same likelihood of catching yet unknown issues..? Looking > forward to reactions :) . Yep. Despite what some people might think, I'm really interested in tests. I used to write test programs to detect lots of issues on PCs 30 years ago when I was still a kid, I even managed to detect some fake chips and caches by then. That might also be why I insist on efficient testing and not just testing which makes people feel good. I'd really prefer to have only 20 quick tests covering more than 50% of the tricky parts we regularly break and which people never have any excuse for not running, than 500 tests that are a pain to run or debug or that constantly report "OK" because they're related to bugs that were fixed 5 years ago and that are impossible to meet again unless someone does it on purpose. However I know that for this to work, we need to create momentum around tests, process and methodology. If I start by asking that we work on such efficient tests, we won't ever see nothing because each attempt will suffer from the same failures we already see and will be demotivating. By starting the way we do right now, we can experiment, test, collect feedback and ideas, encourage people to use the tool to help developers reproduce a bug in their environment, etc. Once enough people have an experience and a valuable opinion on what can be done, it will be easier to go further and improve the situation. At the moment I can say I'm really pleased to see that this is progressing faster than I would have imagined ;-) Cheers, Willy