Thanks for these various responses. I agree that I should be checking input more carefully and will do so. In my experience most developers find it useful to allow both GET and POST input so would prefer not to deny GET requests.
But I do agree with Doug's fix to stop the crawler following POST links as the recommendation is that POST requests are used where side-effects are likely (see http://www.w3.org/2001/tag/doc/whenToUseGet.html#checklist). I assume this fix will make it into 0.7.2 some time, if I don't want to build from CVS. I'm not quite sure Jack's response about Stanford's HiWE search engine was a direct answer to my question, but it does raise the question of whether some applications will always think there are valid reasons to submit form POSTs in an effort to discover "the hidden web". This seems very reminiscent of the Google Web Accelerator saga earlier this year (e.g. see http://www.sitepoint.com/newsletter/viewissue.php?id=3&issue=113&format=html ), although that caused problems even with hrefs with side-effects (bad idea!) but usually only when users are logged in. Andy Read www.azurite.co.uk