On Tue, Oct 9, 2012 at 7:12 PM, Jeff Garzik <jgar...@exmulti.com> wrote: > * Data-driven tests: If possible, write software-neutral, data-driven > tests. This enables clients other than the reference one (Satoshi > client) to be tested. Embed tests in testnet3 chain, if possible.
The mention of testnet3 here reminds me to make a point: Confirmation bias is a common problem for software testing— people often over test the success cases and under-test the failure cases. This is certainly the case in Bitcoin: For example, testnet3+the packaged tests test all the branches inside the interior script evaluation engine _except_ the rejection cases. For us failure cases can be harder to package up (e.g. can't be placed in testnet) but Matt's node-simulation based tester provides a good example of how to create a data driven test set that tests both failure cases and dynamic behavior (e.g. reorgs). Testing of failure cases is absolutely critical for testing of implementation compatibility: The existence of a difference in what gets rejected in a widely deployed alternative node could result in an utterly devastating network split. Generally every test of something which must succeeded should be matched by a test of something that must fail. Personally, I like to test the boundary cases— e.g. if something has an allowed range of [0-8], I'll test -1,0,8,9 at a minimum. Though reasoning trumps rules of thumb. Confirmation bias is another reason why it's important to have a more diverse collection of testers than the core developers. People who work closely with the software have strong expectations of how the software should work and are less likely to test crazy corner cases because they "know" the outcome, sometimes erroneously. To reinforce Jeff's list of different approaches: I've long found that each mechanism of software testing has diminishing returns the more of it you apply. So you're best off using as many different approaches a little rather than spending all your resources going as deep as possible with any one approach. There are also some kind of testing which are synergistic: Almost all testing is enhanced enormously by combining it with valgrind because it substantially lowers the threshold of issue detection substantially (e.g. detecting bogus memory accesses which are _currently_ causing a crash for you but could). If I could only test one of "with valgrind" or "without" I'd test with every time. Sadly valgrind doesn't exist on windows and it's rather slow. Dr. Memory (http://code.google.com/p/drmemory/) may be an alternative on Windows, and there is work to port ASAN to GCC so it may be possible to mingw ASAN builds in not too long. I've also found that any highly automatable testing (coded data driven, unit, and fuzz testing) combines well with diverse compilation, e.g. building on as many system types and architectures— including production irrelevant ones— as possible in the hopes that some system specific quark make a bug easier to detect. ------------------------------------------------------------------------------ Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev _______________________________________________ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development