Hi Pieter,

On Thu, Sep 20, 2018 at 12:48:35AM +0200, PiBa-NL wrote:
> Test takes like 5 seconds to run here, and while that is a bit long if you
> get a hundred more similar tests and want to continue tweaking developments
> while running tests in between. It wouldn't hurt to run such a (series) of
> longer tests before creating a patch and submitting it for inclusion on the
> official git repository in my opinion, or before a release.

Definitely, however those run before a release should be almost 100%
system-agnostic. Having to prepare the system and tune it properly for
the test not to fail is going to cause a headache every time there is
a failure because it will mean fixing the root cause and re-run the
whole suite, which precisely is what will make all this suite not to
be used at all anymore. This is why I'm really picky on the reliability
and the speed of these tests. It should be stupid-proof, with me being
the stupid (and believe me when it comes to doing repetitive stuff, I'm
among the stupidest persons you have ever met).

> My attempt was
> to test a bit differently than just looking for regressions of known fixed
> bugs, and putting a little load on haproxy so that threads and simultaneous
> actions 'might' get into conflicts/locks/stuff which might by chance, show
> up, which is why i choose to go a little higher on the number of round-trips
> with ever slightly increasing payload..

I really understand the point and I think it is valid to a certain extent.
But that's really not a thing to run by default. And I want to encourage us
(including me) to run reg tests from time to time. If you know that some of
them will take too long, you'll quickly end up avoiding all the ones you can
easily avoid using a single command (eg: playing with the LEVEL variable, or
not running it at all).

> For me the test produces like 345 lines of output as attached. which seems
> not to bad (if the test succeeds).

It's already far too much for a user. I should only know if it works
or not, otherwise it hides the output of all other ones (which is what
happened). We must not have to parse the output to know if we didn't
break anything, we just have to check that it looks normal. Here's what
make reg-tests gives me on 1.8 :

  willy@wtap:haproxy$ time sh make-reg-tests-1.8 
  #    top  TEST reg-tests/lua/b00002.vtc passed (0.159)
  #    top  TEST reg-tests/lua/b00001.vtc passed (0.122)
  #    top  TEST reg-tests/lua/b00000.vtc passed (0.110)
  #    top  TEST reg-tests/lua/b00003.vtc passed (0.137)
  #    top  TEST reg-tests/connection/b00000.vtc passed (0.172)
  #    top  TEST reg-tests/server/b00000.vtc passed (0.110)
  #    top  TEST reg-tests/spoe/b00000.vtc passed (0.008)
  #    top  TEST reg-tests/ssl/b00000.vtc passed (0.139)
  #    top  TEST reg-tests/stick-table/b00001.vtc passed (0.110)
  #    top  TEST reg-tests/stick-table/b00000.vtc passed (0.110)
  #    top  TEST reg-tests/log/b00000.vtc passed (0.125)
  #    top  TEST reg-tests/seamless-reload/b00000.vtc passed (0.123)
  
  real    0m1.713s
  user    0m0.316s
  sys     0m0.068s

As you can see there's no output to parse, it *visually* looks correct.

> Besides the 2 instances of cli output
> for stats, its seems not that much different from other tests..
> And with 1.8.13 on FreeBSD (without qkueue) it succeeds:  #    top TEST
> ./test/b00000-loadtest.vtc passed (4.800

OK then you get a valid output there. It's here that it's ugly. But we
spend enough time analysing bugs, I really refuse to spend extra time
fixing bugs in tests supposed to help detect bugs, otherwise it becomes
recursive...

> Taking into account conntrack and ulimit, would that mean we can never
> 'reg-test' if haproxy can really handle like 10000 connections without
> issue?

10k conns definitely is way beyond what you can expect from a non-root
user on a regular shell. I run most of my configs with "maxconn 400"
because that's less than 1024 FDs once you add the extra FDs for
listeners and checks. Anything beyond that will depend on the users'
setup and becomes tricky. And in this case it's more a matter of
stress-testing the system, and we can have stress-test procedures or
tools (just like we all do on our respective setups with different
tools). It's just that one has to know in advance that some preparation
is needed (typically killing a possible browser, unloading some modules,
checking that there's still memory left, maybe adding some addresses to
the loopback, etc). So it's a different approach and it definitely is
out of the scope of automatizing detection of potential regressions
during development.

> Or should the environment be configured by the test?? ,that seems
> very tricky at least and probably would be undesirable..

No definitely it would be even worse. For sure those used to run
"curl | sudo bash" will have no problem letting it configure their
system, but I'm not among such people and I appreciate a lot that
my machine works every morning when I want to work :-)

> >   $ make reg-tests/heavy/conn-counter-3000-req.log
> I'm not exactly sure..("make: don't know how to make reg-tests. Stop"). i
> would still like to have a way to run all 'applicable' tests with 1 command,
> even if it takes a hour or so to verify haproxy is working 'perfectly'.

One thing is important to keep in mind regarding automated testing : the
tests are *only* useful if they take less cumulated time to detect a bug
than the time it would have taken to detect it oneself. I mean (I'm
exagerating a bit but you'll get it), if the tool takes 1 minute per build,
100 builds a day, thus roughly 25000 minutes per year, that's roughly 52
work days at 8h/day. I definitely claim that a human will not waste 52 full
days a year to detect a bug, not even to fix it. So there is a reasonable
tradeoff to set. That's also why I'm saying that I'm *not* interested in
tests for already fixed bugs, they only waste test time. Their only purpose
is for backports, because like any reproducer, it helps the stable team to
verify that 1) the bug is indeed present in the stable version and needs
a fix, and 2) that the backport was properly done. But once done, this
test becomes useless and for now I don't have a good solution to propose
to keep them without having to re-run them.

I suspect that we could use sequence numbers for such tests, or maybe
just dates, and have a file somewhere that we update from time to time,
which contains the earliest version that we're going to test in that
category (i.e. "run tests for all bugs fixed since 20170101"). It would
only require a single commit in each maintenance release to bump that
file and say "OK, no need to test these ones anymore, they are fixed".

My interest is in large coverage, functional coverage. We can have
config files making use of 10-15% of the known features at once, and
which will fail if any of such features get broken, but will never ever
fail otherwise. This is useful testing. But it's not as easy to implement
as it seems, because once you factor in the maintenance aspect, you'll
realise that sometimes you have to update the test file to adjust something
related to a minor behaviour change and that it doesn't backport as easily.
But that's where most of the value lies in my opinion.

> But
> like abns@ tests cant work on FreeBSD, but they should not 'fail', perhaps
> get skipped automatically though.?.

Very likely. In fact given that we want *functional* coverage, this means
that either the test is specific to abns and should be skipped on FreeBSD,
or it's generic and makes use of abns becaus eit was convenient there, and
it has to be modified to be portable.

> Anyhow thats a question for my other
> mail-topic ( https://www.mail-archive.com/haproxy@formilux.org/msg31195.html
> )

Thanks for the link, I think I missed this one.

> > Thus for now I'm not applying your patch, but I'm interested in seeing
> > what can be done with it.
> Okay no problem :) , ill keep running this particular test myself for the
> moment, it 'should' be able to pass normally..  (On my environment anyhow..)
(...)
> I'm interested in Fred's and anyone elses opinion ;) , and well maybe this
> particular test-case could be replaced by something simpler/faster/ with
> more or less the same likelihood of catching yet unknown issues..? Looking
> forward to reactions :) .

Yep. Despite what some people might think, I'm really interested in tests.
I used to write test programs to detect lots of issues on PCs 30 years ago
when I was still a kid, I even managed to detect some fake chips and caches
by then. That might also be why I insist on efficient testing and not just
testing which makes people feel good. I'd really prefer to have only 20
quick tests covering more than 50% of the tricky parts we regularly break
and which people never have any excuse for not running, than 500 tests that
are a pain to run or debug or that constantly report "OK" because they're
related to bugs that were fixed 5 years ago and that are impossible to meet
again unless someone does it on purpose.

However I know that for this to work, we need to create momentum around
tests, process and methodology. If I start by asking that we work on such
efficient tests, we won't ever see nothing because each attempt will suffer
from the same failures we already see and will be demotivating. By starting
the way we do right now, we can experiment, test, collect feedback and ideas,
encourage people to use the tool to help developers reproduce a bug in their
environment, etc. Once enough people have an experience and a valuable
opinion on what can be done, it will be easier to go further and improve
the situation. At the moment I can say I'm really pleased to see that this
is progressing faster than I would have imagined ;-)

Cheers,
Willy

Reply via email to