On Feb 6, 2009, at 11:01 AM, Eric Kow wrote:
As Petr points out, this approach has its disadvantages:
1. Slow:... it takes us one whole year to get rid of anything, as
major darcs releases are every 6 months;
2. Time-consuming: because it causes us to split our time between
maintaining the old stuff and working on the new stuff and
3 Potentially bad engineering: we're increasing the number of
possible code paths (yuck! conditional compilation!) thereby
reducing the amount of time that each path is explored.
So despite my claims, it's not even completely clear that the so-
called "conservative" sunset procedure is the sort of responsible
engineering practice that it aspires to be.
These are pretty strong criticisms. It could be that the sunset
approach would make new releases buggier instead of more stable. I
have some experience with this sort of approach and while I don't
know if my experience generalizes to the darcs codebase, I would say
that the sunset approach didn't work out well for me.
Third, we want to make sure that we never break darcs, because
sometimes Life Just Happens: deadlines pile up at work, buses hit
people, hackers get girlfriends, babies are born.
This is a good consideration to keep in mind (and by the way the same
thing happens in proprietary software development in a company --
priorities change all the time). I think the right approach to this
is to have the policy of "trunk is always good". You never break
trunk at time T, intending to fix it again at time T+n. Instead you
do whatever work you need to make it good in a branch, and then
commit it to trunk once it is strictly better than the current
trunk. A corollary of this is that if a patch lands in trunk and
then is discovered to contain a regression, then that patch is rolled-
back.
Obviously this is pretty much impossible without test-driven
development. If you don't have thorough tests, then how do you know
if you're breaking things with your patches?
You have some good questions about test-driven development:
1. The kinds of things the sunset procedure aims to catch are
integration errors (unexpected interaction between different parts
of darcs), and also real-world errors (e.g. HTTP not working behind
proxies) that seem tricky to capture in laboratory conditions. I
don't mean to say "we shouldn't do automated testing because it
can't cover everything". Of course we should do more automated
testing. But how should we catch the real world errors?
My experiences with Twisted, and with Brian Warner on Tahoe, have
taught me that such issues are a lot more programmable and
reproducible than I had thought. Things that I used to consider
obviouly "manual", like "Write to the AIX user and ask him to
misconfigure his network in that same way again and try again with
this new build", are to these guys "automatable", like "Run a
buildslave on AIX, figure out exactly which parts of our source code
can be affected by network misconfiguration, and test how that code
handles that effect.".
This is not to deny your point -- certainly integration and "real
world" are always full of surprises, and some things can't be
automated with reasonable effort, and you will always want manual
testing after all the automated testing is done. But what I've
learned is that automated testing can address 90% of those cases that
I formerly thought required manual testing.
2. I'm not sure how to go about testing IO-intensive stuff (I guess
our functional tests, i.e. the shell scripts, are a good example)
In Twisted and in Tahoe, I've seen two complementary approaches
taken. One is the lower-level, "unit test" sort of approach --
figure out what functions will be called with what sort of inputs in
response to the I/O, and thoroughly test those functions under those
inputs. Haskell should be *great* at this, right? The whole *point*
of side-effect-free programming is that you don't have to worry about
things *other* than the arguments affecting the computation.
The other is a more holistic "functional test" approach -- simulate
the circumstances that the code under test is required to handle. If
you want to test that the code handles a user who mashes down the "n"
key, then launch a subprocess, exec darcs in that subprocess, send a
thousand "n" chars on its stdin, and examine how it behaves. If
Haskell is not already good at this sort of thing, then you can
always write your functional tests (as currently) in bash (ugh), Perl
(ugh), Python (yay!) or something, but Haskell is probably going to
get good at this sort of thing, because Haskell is growing up, and
this is the sort of thing that a modern, well-rounded practical
language needs to be good at.
3. It seems that for a heavy reliance on testing to work, we are
going to need to have much much wider test coverage. How do we
break out of this chicken and egg? Do we put everything on hold
and launch a massive darcs testing initiative?
What the Twisted folks did when switching from their previous
practices to the Ultimate Quality Development System was simply to
mandate that any new patches had to fully satisfy the new
requirements. This works well, because if the current code contains
bugs, then at least they are old bugs, and in practice it causes less
havoc to keep old bugs than it would to replace them with new bugs.
The result of Twisted's practice has been a near-monotonic
improvement in code quality -- the rate at which new bugs are
introduced by patches is now much lower than the rate at which old
bugs are fixed by patches.
Regards,
Zooko
---
Tahoe, the Least-Authority Filesystem -- http://allmydata.org
store your data: $10/month -- http://allmydata.com/?tracking=zsig
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users