Re: Property based testing update

Eric Merritt Mon, 28 Mar 2011 07:20:53 -0700

Torben,

 This is beautiful, It is an extremely good introduction.  I am really
looking forward to seeing the error, shrinking stuff.


Eric

On Sun, Mar 27, 2011 at 3:31 PM, Torben Hoffmann
<[email protected]> wrote:
> Hi,
>
> I have converted the earlier mail into wikicreole markup - see
> https://github.com/lehoff/erlware_commons/blob/master/pbt.creole - and my
> colleague Raghav has improved the initial
>  ramblings on top of which I in turn have started going through a concrete
> example.
>
> Still work in progress!!e
>
> A aside: the github creole renderer does not work as well as my home made
> one (should release that to the public one day...) so please bear with it
> for now until I figure out where the text will get its own life.
>
> Cheers,
> Torben
>
> p.s. for ease of commenting for those who do not go in and modify the
> description directly here is the updated description:
>
> Property based testing for unit testers
>
> Main contributors: Torben Hoffmann, Raghav Karol, Eric Merritt
>
> The purpose of the short document is to help people who are familiar with
> unit testing understand how property based testing (PBT) differs, but also
> where the thinking is the same.
>
> This document focusses on the PBT tool PropEr for Erlang since that is what
> I am familiar with, but the general principles applies to all PBT tools
> regardless of which language they are written in.
>
> The approach taken here is that we hear from people who are used to working
> with unit testing regarding how they think when designing their tests and
> how a concrete test might look.
>
> These descriptions are then "converted" into the way it works with PBT, with
> a clear focus on what stays the same and what is different.
>
> Testing philosophies
>
> A quote from Martin Logan:
>
> For me unit testing is about contracts. I think about the same things
> I think about when I write statements like {ok, Resp} =
> Mod:Func(Args). Unit testing and writing specs are very close for me.
> Hypothetically speaking lets say a function should return return {ok,
> string()} | {error, term()} for all given input parameters then my
> unit tests should be able to show that for a representative set of
> input parameters that those contracts are honored. The art comes in
> thinking about what that set is.
>
> The trap in writing all your own tests can often be that we think about the
> set in terms of what we coded for and not what may indeed be asked of our
> function. As the code is tried in further exploratory testing and in
> production new input parameter sets for which the given function does not
> meet the stated contract are discovered and added to the test case once a
> fix has been put into place.
>
> This is a very good description of what the ground rules for unit testing
> are:
>
> Checking that contracts are obeyed.
> Creating a representative set of input parameters.
>
> The former is very much part of PBT - each property you write will check a
> contract, so that thinking is the same.
>
> xUnit vs PBT
>
> Unit testing has become popular for software testing with the advent of
> xUnit tools like jUnit for Java. xUnit like tools typically provide a
> testing framework with the following functionality
>
> test fixture setup
> test case execution
> test fixture teardown
> test suite management
> test status reporting and management
>
> While xUnit tools provide a lot of functionality to execute and manage test
> cases and suites, reporting results there is no focus on test case execution
> step, while this is the main focus area of property-based testing (PBT).
>
> Consider the following function specification
>
> sort(list::integer()) ---> list::integer() | error
>
> A verbal specification of this function is,
>
> For all input lists of integers, the sort function returns a sorted list of
> integers.
>
> For any other kind of argument the function returns the atom error.
>
> The specification above may be a requirement of how the function should
> behave or even how the function does behave. This distinction is important;
> the former is the requirement for the function, the latter is the actual
> API. Both should be the same and that is what our testing should confirm.
> Test cases for this function might look like
>
> assertEqual(sort([5,4,3,2,1]), [1,2,3,4,5])
> assertEqual(sort([1,2,3,4,5]), [1,2,3,4,5])
> assertEqual(sort([]         ), []         )
> assertEqual(sort([-1,0, 1]  ), [-1, 0, 1] )
>
> How many tests cases should we write to be convinced that the actual
> behaviour of the function is the same as its specification? Clearly, it is
> impossible to write tests cases for all possible input values, here all
> lists of integers, the art of testing is finding individual input values
> that are representative of a large part of the input space. We hope that the
> test cases are exhaustive to cover the specification. xUnit tools offer no
> support for this and this is where PBT and PBT Tools like PropEr and
> QuickCheck come in.
>
> PBT introduces testing with a large set of random input values and verifying
> that the specification holds for each input value selected. Functions used
> to generate input values, generators, are specified using rules and can be
> simply composed together to construct complicated values. So, a property
> based test for the function above may look like:
>
> FOREACH({I, J, InputList},  {nat(), nat(), integer_list()},
>     SUCHTHAT(I < J andalso J < length(InputList),
>     SortedList = sort(InputList)
>     length(SortedList) == length(InputList)
>     andalso
>     lists:get(SortedList, I) =< lists:get(SortedList, J))
>
> The property above works as follows
>
> Generate a random list of integers InputList and two natural numbers I, J,
> such that I < J < size of InputList
> Check that size of sorted and input lists is the same.
> Check that element with smaller index I is less than or equal to element
> with larger index J in SortedList.
>
> Notice in the property above, we specify property. Verification of the
> property based on random input values will be done by the property based
> tool, therefore we can generated a large number of tests cases with random
> input values and have a higher level of confidence that the function when
> using unit tests alone.
>
> But it does not stop at generation of input parameters. If you have more
> complex tests where you have to generate a series of events and keep track
> of some state then your PBT tool will generate random sequences of events
> which corresponds to legal sequences of events and test that your system
> behaves correctly for all sequences.
>
> So when you have written a property with associated generators you have in
> fact created something that can create numerous test cases - you just have
> to tell your PBT tool how many test cases you want to check the property on.
>
> Shrinking the bar
>
> At this point you might still have the feeling that introducing the notion
> of some sort of generators to your unit testing tool of choice would bring
> you on par with PBT tools, but wait there is more to come.
>
> When a PBT tool creates a test case that fails there is real chance that it
> has created a long test case or some big input parameters - trying to debug
> that is very much like receiving a humongous log from a system in the field
> and try to figure out what cause the system to fail.
>
> Enter shrinking...
>
> When a test case fails the PBT tool will try to shrink the failing test case
> down to the essentials by stripping out input elements or events that does
> not cause the failure. In most cases this results in a very short
> counterexample that clearly states which events and inputs are required to
> break a property.
>
> As we go through some concrete examples later the effects of shrinking will
> be shown.
>
> Shrinking makes it a lot easier to debug problems and is as key to the
> strength of PBT as the generators.
>
> Converting a unit test
>
> We will now take a look at one possible way of translating a unit test into
> a PBT setting.
>
> The example comes from Eric Merritt and is about the add/2 function in the
> ec_dictionary instance ec_gb_trees.
>
> The add function has the following spec:
>
> -spec add(ec_dictionary:key(), ec_dictionary:value(), Object::dictionary())
> ->
>     dictionary().
>
> and it is supposed to do the obvious: add the key and value pair to the
> dictionary and return a new dictionary.
>
> Eric states his basic expectations as follows:
>
> I can put arbitrary terms into the dictionary as keys
> I can put arbitrary terms into the dictionary as values
> When I put a value in the dictionary by a key, I can retrieve that same
> value
> When I put a different value in the dictionary by key it does not change
> other key value pairs.
> When I update a value the new value in available by the new key
> When a value does not exist a not found exception is created
>
> The first two expectations regarding being able to use arbritrary terms as
> keys and values is a job for generators.
>
> The latter four are prime candidates for properties and we will create one
> for each of them.
>
> Generators
>
> key() -> any().
>
> value() -> any().
>
> For PropEr this approach has the drawback that creation and shrinking
> becomes rather time consuming, so it might be better to narrow to something
> like this:
>
> key() -> union([integer(),atom()]).
>
> value() -> union([integer(),atom(),binary(),boolean(),string()]).
>
> What is best depends on the situation and intended usage.
>
> Now, being able to generate keys and values is not enough. You also have to
> tell PropEr how to create a dictionary and in this case we will use a
> symbolic generator (detail to be explained later).
>
> sym_dict() ->
>     ?SIZED(N,sym_dict(N)).
>
> sym_dict(0) ->
>     {'$call',ec_dictionary,new,[ec_gb_trees]};
> sym_dict(N) ->
>     ?LAZY(
>        frequency([
>                 {1, {'$call',ec_dictionary,remove,[key(),sym_dict(N-1)]}},
>                 {2, 
> {'$call',ec_dictionary,add,[value(),value(),sym_dict(N-1)]}}
>                ])
>       ).
>
> sym_dict/0 uses the ?SIZED macro to control the size of the generated
> dictionary. PropEr will start out with small numbers and gradually raise it.
>
> sym_dict/1 is building a dictionary by randomly adding key/value pairs and
> removing keys. Eventually the base case is reached which will create an
> empty dictionary.
>
> The ?LAZY macro is used to defer the calculation of the sym_dict(N-1)}}
> until they are needed and {{{frequency/1 is used to ensure that twice as
> many adds compared to removes are done. This should give rather more
> interesting dictionaries in the long run, if not one can alter the
> frequencies accondingly.
>
> But does it really work?
>
> That is a good question and one that should always be asked when looking at
> genetors. Fortunately there is a way to see what a generator produces
> provided that the generator functions are exported.
>
> Hint: in most cases it will not hurt to throw in a -compile(export_all). in
> the module used to specify the properties. And here we actually have a
> sub-hint: specify the properties in a separate file to avoid peeking inside
> the implementation! Base the test on the published API as this is what the
> users of the code will be restricted to.
>
> When the test module has been loaded you can test the generators by starting
> up an Erlang shell (this example uses the erlware_commons code so get
> yourself a clone to play with):
>
> $ erl -pz ebin -pz test
> 1> proper_gen:pick(ec_dictionary_proper:key()).
> {ok,4}
> 2> proper_gen:pick(ec_dictionary_proper:key()).
> {ok,35}
> 3> proper_gen:pick(ec_dictionary_proper:key()).
> {ok,-5}
> 4> proper_gen:pick(ec_dictionary_proper:key()).
> {ok,48}
> 5> proper_gen:pick(ec_dictionary_proper:key()).
> {ok,'\036\207_là´?\nc'}
> 6> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,2}
> 7> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,-14}
> 8> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,-3}
> 9> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,27}
> 10> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,-8}
> 11> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,[472765,17121]}
> 12> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,true}
> 13> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,<<>>}
> 14> proper_gen:pick(ec_dictionary_proper:value()).
> {ok,<<89,69,18,148,32,42,238,101>>}
> 15> proper_gen:pick(ec_dictionary_proper:sym_dict()).
> {ok,{'$call',ec_dictionary,add,
>         [[114776,1053475],
>          'fª\020\227\215',
>          {'$call',ec_dictionary,add,
>              ['',true,
>               {'$call',ec_dictionary,add,
>                   ['2^Ø¡',
>                    [900408,886056],
>
> {'$call',ec_dictionary,add,[[48618|...],<<...>>|...]}]}]}]}}
> 16> proper_gen:pick(ec_dictionary_proper:sym_dict()).
> {ok,{'$call',ec_dictionary,add,
>         [10,'a¯\214\031fõC',
>          {'$call',ec_dictionary,add,
>              [false,-1,
>               {'$call',ec_dictionary,remove,
>                   ['d·ÉV÷[',
>                    {'$call',ec_dictionary,remove,[12,{'$call',...}]}]}]}]}}
>
> That does not look too bad, so we will continue with that for now.
>
> Properties of {{{add/2}}}
>
> The first expectation Eric had about how the dictionary works was that if a
> key had been stored it could be retrieved.
>
> One way of expressing this could be with this property:
>
> prop_get_after_add_returns_correct_value() ->
>     ?FORALL({Dict,K,V}, {sym_dict(),key(),value()},
>           begin
>               try ec_dictionary:get(K,ec_dictionary:add(K,V,Dict)) of
>                   V ->
>                       true;
>                   _ ->
>                       false
>               catch
>                   _:_ ->
>                       false
>               end
>           end).
>
> This property reads that for all dictionaries get/2 using a key from a
> key/value pair just inserted using the add/3 function will return that
> value. If that is not the case the property will evaluate to false.
>
> Running the property is done using proper:quickcheck/1:
>
> proper:quickcheck(ec_dictionary_proper:prop_get_after_add_returns_correct_value()).
> ....................................................................................................
> OK: Passed 100 test(s).
> true
>
> This was as expected, but at this point we will take a little detour and
> introduce a mistake in the ec_gb_trees implementation and see how that
> works.
>
> --
> http://www.linkedin.com/in/torbenhoffmann
>

-- 
You received this message because you are subscribed to the Google Groups 
"erlware-dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/erlware-dev?hl=en.

Re: Property based testing update

Reply via email to