On Feb 6, 2009, at 11:01 AM, Eric Kow wrote:

As Petr points out, this approach has its disadvantages:
1. Slow:... it takes us one whole year to get rid of anything, as major darcs releases are every 6 months; 2. Time-consuming: because it causes us to split our time between maintaining the old stuff and working on the new stuff and 3 Potentially bad engineering: we're increasing the number of possible code paths (yuck! conditional compilation!) thereby reducing the amount of time that each path is explored. So despite my claims, it's not even completely clear that the so- called "conservative" sunset procedure is the sort of responsible engineering practice that it aspires to be.

These are pretty strong criticisms. It could be that the sunset approach would make new releases buggier instead of more stable. I have some experience with this sort of approach and while I don't know if my experience generalizes to the darcs codebase, I would say that the sunset approach didn't work out well for me.

Third, we want to make sure that we never break darcs, because sometimes Life Just Happens: deadlines pile up at work, buses hit people, hackers get girlfriends, babies are born.

This is a good consideration to keep in mind (and by the way the same thing happens in proprietary software development in a company -- priorities change all the time). I think the right approach to this is to have the policy of "trunk is always good". You never break trunk at time T, intending to fix it again at time T+n. Instead you do whatever work you need to make it good in a branch, and then commit it to trunk once it is strictly better than the current trunk. A corollary of this is that if a patch lands in trunk and then is discovered to contain a regression, then that patch is rolled- back.

Obviously this is pretty much impossible without test-driven development. If you don't have thorough tests, then how do you know if you're breaking things with your patches?

You have some good questions about test-driven development:

1. The kinds of things the sunset procedure aims to catch are integration errors (unexpected interaction between different parts of darcs), and also real-world errors (e.g. HTTP not working behind proxies) that seem tricky to capture in laboratory conditions. I don't mean to say "we shouldn't do automated testing because it can't cover everything". Of course we should do more automated testing. But how should we catch the real world errors?

My experiences with Twisted, and with Brian Warner on Tahoe, have taught me that such issues are a lot more programmable and reproducible than I had thought. Things that I used to consider obviouly "manual", like "Write to the AIX user and ask him to misconfigure his network in that same way again and try again with this new build", are to these guys "automatable", like "Run a buildslave on AIX, figure out exactly which parts of our source code can be affected by network misconfiguration, and test how that code handles that effect.".

This is not to deny your point -- certainly integration and "real world" are always full of surprises, and some things can't be automated with reasonable effort, and you will always want manual testing after all the automated testing is done. But what I've learned is that automated testing can address 90% of those cases that I formerly thought required manual testing.

2. I'm not sure how to go about testing IO-intensive stuff (I guess our functional tests, i.e. the shell scripts, are a good example)

In Twisted and in Tahoe, I've seen two complementary approaches taken. One is the lower-level, "unit test" sort of approach -- figure out what functions will be called with what sort of inputs in response to the I/O, and thoroughly test those functions under those inputs. Haskell should be *great* at this, right? The whole *point* of side-effect-free programming is that you don't have to worry about things *other* than the arguments affecting the computation.

The other is a more holistic "functional test" approach -- simulate the circumstances that the code under test is required to handle. If you want to test that the code handles a user who mashes down the "n" key, then launch a subprocess, exec darcs in that subprocess, send a thousand "n" chars on its stdin, and examine how it behaves. If Haskell is not already good at this sort of thing, then you can always write your functional tests (as currently) in bash (ugh), Perl (ugh), Python (yay!) or something, but Haskell is probably going to get good at this sort of thing, because Haskell is growing up, and this is the sort of thing that a modern, well-rounded practical language needs to be good at.

3. It seems that for a heavy reliance on testing to work, we are going to need to have much much wider test coverage. How do we break out of this chicken and egg? Do we put everything on hold and launch a massive darcs testing initiative?

What the Twisted folks did when switching from their previous practices to the Ultimate Quality Development System was simply to mandate that any new patches had to fully satisfy the new requirements. This works well, because if the current code contains bugs, then at least they are old bugs, and in practice it causes less havoc to keep old bugs than it would to replace them with new bugs. The result of Twisted's practice has been a near-monotonic improvement in code quality -- the rate at which new bugs are introduced by patches is now much lower than the rate at which old bugs are fixed by patches.

Regards,

Zooko
---
Tahoe, the Least-Authority Filesystem -- http://allmydata.org
store your data: $10/month -- http://allmydata.com/?tracking=zsig
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to