Re: [Discuss] Testing for data analysis code

Sam Penrose Thu, 11 May 2017 07:09:30 -0700

Somewhat in the spirit of Noam's comment, and in my personal experience:

1) General unit testing is much better than no testing for oft-repeated reasons.


2) Data code is different in kind from the code which led to unit-test
culture for three material reasons: data has enormously more state
than function parameters and variables normally do, the variance of
production data tends to exceed the variance of development data, and
finally because the job of data code is to transmit all that enormous
and varying state, the notion of a "unit" doesn't really fit data
processing / analysis pipelines.

3) Acceptance testing is good; monitoring is good; unifying them is
extremely powerful.

On Thu, May 11, 2017 at 7:00 AM, Noam Ross <[email protected]> wrote:
> To me (and mostly from an R perspective), the major difference between unit
> testing in packages and testing in analyses is that for the latter one tests
> objects, not functions.  That is, rather than testing that a function gives
> you a expected output for given inputs,  you are checking that objects (data
> sets, fit models, summaries) have the properties you expect given your
> analysis assumptions.  Most documentation and tutorials on testing I've seen
> focuses on functions.
>
> Object-based testing is for data sets is the approach used in the
> assertr/validate/pointblank packages (for data in R), or TDDA (for data in
> Python). I don't know of solutions tooled specifically to non-data
> components of analyses. These are tough because some of them, like model fit
> tests, can be subjective. I also don't know a great framework for separating
> tests from analysis scripts, as you need access to your objects to run
> tests.
>
> FWIW, for the upcoming rOpenSci unconf we have two potentially related
> projects: testing blocks for R markdown documents
> (https://github.com/ropensci/unconf17/issues/38), and maybe something more
> general for saving and testing objects
> (https://github.com/ropensci/unconf17/issues/5).
>
> On Wed, May 10, 2017 at 12:35 PM Pariksheet Nanda
> <[email protected]> wrote:
>>
>> Hi Naupaka,
>>
>> I can't speak for what would be most successful in a classroom, though.
>> For my graduate research work with R, I follow the guidelines of
>> http://r-pkgs.had.co.nz/ without changing much namely:
>>
>> 1) Create a package with devtools, then create the analysis script in
>> vignettes/ with devtools::use_vignettes("name-of-vignette").  Edit the Rmd
>> file and run code blocks with Ctrl+Enter.
>> 2) Create functions as needed in the R/ directory.
>> 3) Add dependencies directly to DESCRIPTION (one could also use use
>> devtools::use_package())
>> 4) Update NAMESPACE from Roxygen comments in R/ files with
>> devtools::document()
>> 5) Add testthat unit tests for those R functions.
>> 6) Once in a while run devtools::check()
>>
>> The book introduction suggests that this practice is somewhat typical
>> (http://r-pkgs.had.co.nz/intro.html) and am curious about others'
>> experiences.
>>
>> There is some cognitive overhead of working with packages for newer
>> programmers, so you may want to have the workflow and gotchas in a
>> cheatsheet or require some amount of prior R experience for the students to
>> value that particular way of doing things.
>>
>> Pariksheet
>>
>> _______________________________________________
>> Discuss mailing list
>> [email protected]
>> http://lists.software-carpentry.org/listinfo/discuss
>
>
> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/listinfo/discuss
_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] Testing for data analysis code

Reply via email to