On 02/18/2011 07:30 AM, Eli Barzilay wrote:
50 minutes ago, Ryan Culpepper wrote:
On 02/15/2011 07:28 AM, Eli Barzilay wrote:
And finaly, there's the litmus test for existing code:
* Ryan: is something like this enough to implement the GUI layer?
Not well, I think. The Test-Result type in Noel's racktest code is
too simple and inflexible. It represents the minimal essence of
testing, but it would be awkward to extend to richer testing
sytems. Here's my counter-proposal for representing the results of
tests:
[...]
I can't make sense of it, besides a vague "waaaay to heavy" feeling
for something that should be core-ishly minimalistic.
Simplicity is no good if it gets in the way of representing information
that needs to be represented.
In an attempt to follow it, I did this:
TestResult = header
execution
status
but your TestHeader is used only there,
Not necessarily. A testing framework that distinguishes test
construction from test creation might create the header when the test is
constructed. SchemeUnit used to work that way, and RackUnit is able to,
although less gracefully than before.
(See also my final remark, about "test started" notifications.)
> so it could be folded in:
TestResult = name (U String #f)
suite (Listof String)
info Dictionary
execution
status
TestExecution is also used only once so it can also be folded in --
but since it's just a generic dictionary, it can be dropped.
I think it's a bad idea to collapse the two dictionaries, because they
represent different information. Especially since the set of keys is
open-ended, it is helpful to separate information about the test from
information about its execution.
TestResult = name (U String #f)
suite (Listof String)
info Dictionary
status
Now, status is one of three options, the failure one has a dictionary
so it can be removed (folded into the above).
I object again to the conflation of unrelated dictionaries.
So overall, it looks like a simple struct with a name, a "suite" (kind
of, defined indirectly by a hierarchy of string lists), a generic
dictionary for "stuff", and a status. This is all modulo some
questions/issues that are unclear to me:
* It looks like it tries to break away too many pieces into a formal
description. For example, it looks like an overkill to have fields
with the actual value and expected value and worse -- the
comparison. For example, What happens when the comparison is
parameterized (eg, "close within dx to some number")?
Not every testing framework might use 'actual and 'expected, and even
frameworks that do might not use them all the time. For example, in
rackunit:
(check-equal? 'apple 'orange)
would result in the following failure attributes:
'actual => 'apple
'expected => 'orange
'comparison => 'equal?
(I think rackunit currently reports the check name, 'check-equal?,
rather than the comparison name. It could work either way, or maybe it
could include both.)
On the other hand, something like check-not-false might only report an
actual value. And a check parameterized over a tolerance could just
include the tolerance as an extra attribute.
* What happens when there's no specific expected value to compare?
For example, run some two pieces of code 10 times each and check
that the average runtime of the first is below the runtime of the
second. This could be phrased in terms of an expected value, but in
a superficial way, and will prevent useful information from being
expressed (since the information would have to be reduced to two
numbers).
You can include whatever information you want. That's why it's a
dictionary, rather than a fixed set of fields. The real question is how
a test result displayer will know how to interpret the fields correctly.
I think a useful default is to show all attributes with keys that are
interned symbols or strings. Custom attributes would only work for test
result displayers that know about them.
* What if you don't want to hold on to the value? (For example, free
some related resource.)
Then convert it to a string and keep the string. (This is what the
rackunit gui does to report custodian-managed leftovers.)
* This solidifies the list-of-strings as a representation of the test
hierarchy. But perhaps there is no way to avoid this -- if it's
made into a proper hierarchy of objects it will probably complicate
things in a way that requires the listener to get "update" events
that tells it how the structure changed.
I was actually going to propose something more complicated for the
hierarchy, but I figured it was better to leave that for later. I'm
certainly open to changing this part.
* I'm not sure about the error result. It seems to me that this is a
meta issue that you're dealing with when you develop the test suite,
and as such it should be something that you'd deal with in the usual
ways => throw an exception. It's the tools that should be in charge
of catching such an exception and deal with it -- which means that
- in my tester's case, it'll defer to racket as usual, meaning that
you'd just get an error.
- in rackunit's case you'd probably get some report listing the
erroneous tests, instead of propagating the error.
- and in your gui case you'd catch exceptions and show them as error
results.
Are you saying you think a status should only be success or failure? If
so, I disagree. I can see roughly how that would work, but I think it's
useful to distinguish between failure and error at the reporting level.
But if you mean that a testing framework should be allowed to halt on
errors instead of reporting them as error statuses, then I agree.
Also:
And that's not quite the end of it. The rackunit gui creates an
entry for a test case as soon as it starts running, so the user can
see what test case is hanging and interrupt it if they choose. That
requires additional communication between test execution and test
display.
Yes, that would e part of the protocol for the listener -- and it
makes sense to allow tests to invoke it to let it know that a test has
started.
Like maybe sending it just the test-header struct? The part that
represents the information known about the test before it executes,
packaged up as one value?
Although, if we're going to standardize this part it would also be nice
to have a way of indicating that a suite has started, too.
Ryan
_________________________________________________
For list-related administrative tasks:
http://lists.racket-lang.org/listinfo/dev