Re: Getting automated tests for WebRTC

Randell Jesup Tue, 20 Nov 2012 03:12:31 -0800

On 11/19/2012 11:33 PM, Eric Rescorla wrote:

TL;DR:
I think this is almost 180 degrees opposed from the right strategy.
What we need right now is coverage of the things that work so that
when we make changes we know we're not making the situation worse.
*All* other testing work should wait on this. In particular, crashtests
are not useful at this point.

I'll note that "crashtests" is a term being used loosely here.Crashtests are nothing more than "when I load this nothing crashes",while mochitests are "when I load this it calls Ok or No Ok"(effectively). So really, there isn't a lot of difference there beyond(for most WebRTC things) "we got to the normal end of the test".

The way some people are using "crashtests" is really "we once crashedhere, capture that as a crashtest", with the assumption you'reprotecting against that (exact) regression.

In active development of a new feature, tests for real-world(ish) casesare most important. This is (as ekr points out) that the first order ofthings is "don't break existing functionality". Testing for that on eachcheckin is far more important than checking if we regressed a minimizedtestcase from a specific bug.

Once we have the main functionality tests in place (both "sunny day" and"normal failure" cases), then more-specific regression tests begin torise in importance, though they still should usually be as broad a testas possible that captures the failure instead of all minimizedtestcases. The reason is that for it to have value for the future, itshould not just capture someone undoing the bugfix, but should alsocapture any similarly-provoked failures as much as is possible, sincemost future errors will not precisely duplicate an old bug, but willarise from similar stress on the component.

An example might be a bug triggered by having >10 DataChannels open.The minimized crashtest testcase might stop there, and it *would*prevent the specific bug from being regressed. But if some new checkincaused >16 DataChannels to fail, this crashtest wouldn't catch it, so agood (IMO) test would create an unreasonable number of DataChannels, dosomething basic with them to verify they all work, and then returnsuccess. Note that this ends up being a Mochitest, not a crashtest!So, don't think "this was a crash bug, it needs a crashtest". I woulduse crashtests only where there's no reasonable way to report success orfailure. For example, if we changed an API, a crashtest would happilystay green, while as a mochitest it would flag that we broke the test.

This does mean almost all tests should be mochitests. That's not a badthing, it's a good thing! But the bigger takeaway is that tests shouldstart with functionality (as the feature is useless without that), thenadd general stressors (create/destroy fast, create a lot, evilarrangements, obvious edge cases, etc), then (especially as code maturesand stabilizes) add very specific tests. If you focus on the veryspecific tests too early, you end up with a ton of those many of whichno longer are relevant due to architectural changes and refactors (andmay spend a lot of time revising them when an API is still in flux!)This isn't an absolute, and it doesn't mean that a crash bug doesn't getmorphed into a (mochi)test - many still will, but again those should bebroader tests wherever possible, or it should be worked into an existingbroad test.

That's another point: a few broad tests are a lot simpler to revise whenwe have an unstable API than a thousand specific-instance testcases.Given the W3C (and IETF) are still changing the API, we should becautious about this. If a large, generic sunny-day test that hits muchof the API fails, that's generally of (much) more use to me today thethe same set of features being tested in 100 small separate tests, iffor no other reason than the maintenance effect (and there are otherreasons, per above.)

Analysis follows:
Currently, the state of the code can be best described as brittle. The
sunny day cases work and work fairly reliably, but as you deviate from
them, you start to get crashes. This isn't surprising; it's a fairly normal
state of affairs for software that was developed in a hurry with an
emphasis on features rather than testing, which is what this is.

There are two major classes of problems in the code:

- Functional defects (stuff which should work and does not).
- Crashes (i.e., memory errors, whether security relevant or not.)
- Memory and other resource leaks

There are a lot of all of these, and a number of them require fairly
significant refactoring to fix. The result of this is that:

1. It's not going to be possible to *run* automated unit tests for some
time (this is partly due to deficiencies in the testing framework, but you
go to the release with the testing framework you have, not the one you
wish you had.)

2. It's quite common to have multiple reports of problems due to the same
underlying defect. Even when that's not true, we're often going to
accidentally
fix one defect while working on another.

Not just accidentally, but often on purpose as we notice a logical holethat can or has caused a number of similar failures. A perfect exampleis the current "big lock" patch, which causes numerous semi-randomfailures related to timing.

3. It's quite easy to break one piece of functionality while refactoring
another.
We've already had a number of instances on this.

Yes; that's my number one concern for tests at this point. We'rechecking in code with the only true cross-check being ourselves andother devs running manual tests and using the feature. While doing thatis good, tests (automated or run-by-hand-but-scripted) would greatlyreduce the "you broke calls" problem.

Rather, what is useful is to develop unit tests for the stuff that does
work so
that when we are fixing one thing we don't accidentally break another.I
realize
that these cannot currently be run automatically on m-c, but they can be run
either (a) on a developer's own system or (b) on alder under the more
reslient
test framework Ted is hacking up. It already takes far too long to validate
manually
that your changes haven't broken stuff and we don't have anything like full
coverage in JS tests of the functionality that works.

If we want people not to break the system, it needs to be reasonably easy
to verify that one has not done so.


Agreed.

The nice thing about this level of test is that it's easy to develop.
Indeed, we
already have a number of tests that test one variant manually. All we need
is someone to stamp out a bunch of tests that (a) are automatic and (b)
cover more of the space. If you find bugs doing that, then great. Otherwise,
we have a regression harness and you can move on to crashtests.

Agreed, and honestly a basic set of such functionality tests should befairly quick to develop and provide significant help to the projectquickly. And they don't (in my mind) have to be a lot of specific smallfunctionality tests. For example, I'd rather have one test that coveredall the basic call scenarios (audio-only, video-only, audio/video,one-way audio, one-way video, etc) - because they are likely to beeasier to revise if the API changes, and if it fails, we can find outwhy quickly. Not saying it has to be that way, but I might prefer that,especially if I can build a "basic call test framework" I can theninternally automate. See things like the test_URIs xpcshell test forthe URI code where it has a framework and then runs a bazilliontest/edge cases through it, all in one "test".


--
Randell Jesup
_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media

Re: Getting automated tests for WebRTC

Reply via email to