Re: [Discuss] writing unit tests for scientific software

Cameron Macdonell Tue, 18 Jul 2017 19:55:50 -0700

Thanks everyone for the discussion on this very important topic. I think we 
should move this discussion off-list, but I'm not sure where to move it.


We've had "Best practices" blog posts and papers in the past, I think that a 
blog post summary would be a great conclusion.

Is there a volunteer to summarize all this great info and resources?

Sincerely,
Cam

From: Discuss [mailto:[email protected]] On Behalf 
Of Gamblin, Todd
Sent: Tuesday, July 18, 2017 6:42 PM
To: Terri Yu <[email protected]>
Cc: Software Carpentry Discussion <[email protected]>
Subject: Re: [Discuss] writing unit tests for scientific software

Terri: you might be interested in services like codecov:

https://codecov.io/

It will automatically display coverage reports you set up your CI service (e.g. 
Travis) to submit them. Travis integration is pretty simple.

We use codecov for Spack, and it has a nice Chrome plugin you can use to see 
coverage diffs in pull requests.  You can also use it to add static checks to 
enforce that every PR submitted has a certain amount of test coverage.

Spack repo with a coverage badge is here:

https://github.com/LLNL/spack

And the corresponding codecov visualization is here:

https://codecov.io/gh/LLNL/spack

Or you can look at it by file and drill down:

https://codecov.io/gh/LLNL/spack/list/develop/

-Todd




On Jul 18, 2017, at 5:26 PM, Terri Yu 
<[email protected]<mailto:[email protected]>> wrote:

Hi all,
I realized that I have been doing a lot of the right things already, according 
to Titus and others.  In retrospect, the reason I started this thread was that 
I couldn't find much writing on this topic and wanted some reassurance that I 
was on the right track.

I thought it might be useful to share what I've already been doing in my unit 
tests.
1) I use the Python package "coverage".  To make things easier, I wrote a bash 
script that measures coverage for the unit tests and displays the results for 
the relevant source code.  I added a couple tests based on missing coverage.
2) My software uses a command line interface based on the Python standard 
library argparse.  I have a lot of unit tests that simply checking input and 
output from running commands.  A lot of it is simply checking for bad / invalid 
arguments.,  Or checking output for stuff like, if I asked for 4 harmonics, I 
should get 4 columns of harmonic numbers
3) Since the software I'm writing is a Python port of Matlab scripts, I check 
numerical results from my Python code against the output from the Matlab 
scripts.  I extended the unittest TestCase class with a specialized assert 
function that uses NumPy's allclose() function to compare floating point 
results.  I couldn't use the NumPy testing module directly because NumPy uses 
nose as their test framework, whereas I'm using unittest.
4) I do check a few numbers output from algorithms in my unit tests and I was 
wondering if there was any point in that, but the slide in Hans's presentation 
makes the good point that it's useful for checking earlier versions of the code 
against itself.  Also, since I want the software to be cross-platform on 
Windows/Mac/Linux, it's also a good way to check that all platforms are giving 
consistent results.
5) I also have a few unit tests where the parameters to an algorithm are 
changed from the default and the numbers are checked.  This is useful for 
showing that the numbers change when the parameters change, though there is no 
assurance that the numbers are actually "correct".  I wish I had some beautiful 
example with a known analytical result like the 2D Ising model, but I don't 
think I have any cases like that for the algorithms I'm using.

6) I have a couple helper functions that have simple exact answers, so I can 
write unit tests to check for accurate results.  For example, I have a helper 
function that rounds numbers according to the "half away from zero" method.  
Sort of like checking "2+2 = 4" as Titus mentioned.
6) Anywhere that I throw an exception in the source code, I try to make sure 
there's a matching unit test that purposely tries to trigger the exception and 
checks that the correct exception was raised.
Thanks everyone for your help, and hope everyone is learning useful things from 
this thread.
Terri

On Tue, Jul 18, 2017 at 9:38 AM, Paul Wilson 
<[email protected]<mailto:[email protected]>> wrote:
Hi Terri,

I'll briefly add that testing is asymptotic (as suggested by Titus below), so 
it may be difficult to have "every" test.  We rely on code review to help 
identify missing tests, particularly for new code - but also for older code.

Paul



On 07/17/2017 11:11 AM, C. Titus Brown wrote:
Hi Terri,

I think lots of people in the scientific Python community write their
own algorithms and test them.  But it's hard to give generic advice here,
I think, because it's so dependent on your algorithm.

Here's my try / our approach that has worked well for us over the last
decade or so.

* first, write automated "smoke" tests that check to see if your code is
   basically running/working.  They should be as dumb and robust as possible.
   (e.g. the equivalent of "check that 2+2 = 4").

   These are by far the most important in my experience, in that they deliver
   the most value for the least effort.

* set up CI on those tests.

* check code coverage of your code base, and try to get it to 30-40%
   by testing the basic code paths.

* write a series of basic tests for edge cases (divide by zero, boundary
   conditions, that kind of thing), trying to cover another 10-20%.

* as your code base matures and complexifies, write tests for new functionality
   and try to cover old functionality as well.  Here code coverage is your
   friend in terms of targeting effort.

* whenever you discover a bug, write a test against that bug before fixing it.
   That way your most error prone bits will get more coverage adaptively.
   I call this "stupidity driven testing."

Lather, rinse, repeat.

tl; dr? smoke tests, code coverage analysis, test against buggy code.

best,
--titus

On Mon, Jul 17, 2017 at 11:50:59AM -0400, Terri Yu wrote:
Thanks everyone, those are interesting resources for testing in general.

I'm using Python's unittest framework and everything is already set up.
The specific problem I need help with is what tests to write, in order to
test numerical floating point output from algorithms.  Given the responses
I've gotten, it seems like not many people write their own algorithms
and/or test them.

Terri

On Sun, Jul 16, 2017 at 5:50 PM, Jeremy Gray 
<[email protected]<mailto:[email protected]>> wrote:
Hi Terri,


It might also be worth checking out the workshop from this years pycon
from Eria ma:
Best Testing Practices for Data Science, on yotube here -
https://www.youtube.com/watch?v=yACtdj1_IxE

The github repo is here:
https://github.com/ericmjl/data-testing-tutorial

Cheers,
Jeremy

On Fri, Jul 14, 2017 at 5:21 PM, Olav Vahtras 
<[email protected]<mailto:[email protected]>>
wrote:
Dear Terri

In addition I can recommend the following resource:

pythontesting.net<http://pythontesting.net/> has a podcast series on testing 
and more, check out
the new book on pytest by the site maintainer Brian Okken

Regards
Olav



Olav
14 juli 2017 kl. 21:36 skrev Ashwin Srinath 
<[email protected]<mailto:[email protected]>>:

If you're using Python, numpy.testing has the tools you'll need:

https://docs.scipy.org/doc/numpy/reference/routines.testing.html

There's also pandas.testing for testing code that uses Pandas.

Thanks,
Ashwin
On Fri, Jul 14, 2017 at 3:27 PM, Terri Yu 
<[email protected]<mailto:[email protected]>> wrote:
Hi everyone,

Are there any resources that explain how to write unit tests for
scientific
software?  I'm writing some software that processes audio signals and
there
are many parameters.  I'm wondering what's the best way to test
floating
point numeric results.

Do I need to test every single parameter?  How can I verify accuracy of
numeric results... use a different language / library?  I would like
to do a
good job of testing, but I also don't want to write a bunch of
semi-useless
tests that take a long time to run.

I would appreciate any thoughts you have.

Thank you,

Terri

_______________________________________________
Discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.software-carpentry.org/listinfo/discuss
_______________________________________________
Discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.software-carpentry.org/listinfo/discuss
_______________________________________________
Discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.software-carpentry.org/listinfo/discuss

_______________________________________________
Discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.software-carpentry.org/listinfo/discuss

--
-- ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ --

Paul P.H. Wilson
Grainger Professor of Nuclear Engineering
608-263-0807<tel:608-263-0807>
[email protected]<mailto:[email protected]>
443 Engineering Research Bldg
1500 Engineering Dr, Madison, WI 53706
calendar: http://go.wisc.edu/pphw-cal

Computational Nuclear Engineering Research Group
cnerg.engr.wisc.edu<http://cnerg.engr.wisc.edu/>

_______________________________________________
Discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.software-carpentry.org/listinfo/discuss

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] writing unit tests for scientific software

Reply via email to