A nutch bot is crawling a 'submit' page on my site, and it shouldn't. It's the only bot that hits it, and unfortunately it generates a blank email.
Needless to say, I now know I need to change my software so that it doesn't generate an email on a false hit, but the bot shouldn't be spidering it anyway. The only way to get there is via a form submit 'action'. There is no href 'link'. I've also just added a robots.txt entry, so if the software works as advertised, I'm not likely to see any more of these. A couple of log entries showing the issue: - - [16/Feb/2006:18:51:01 -0800] "GET /booking/submit.php HTTP/1.0" 200 3728 "-" "NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" - - [12/Mar/2006:22:50:49 -0800] "GET /booking/submit.php HTTP/1.0" 200 3761 "-" "NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"