decomposing URLs issue

Brian Ziman Tue, 13 Jun 2006 15:44:45 -0700

Dear Nutch Project Gurus,

I'm the webmaster of http://swisspig.net/, and I have noticed periodicaccess by the Nutch crawler at U Washington. However, today's accesswas strange, in that it attempted to crawl to a *portion* of a URL(which of course is not a link in itself). This might be a bug in thecrawler, or a bug in a modification made by the UW folks. The relevantlog snippets are:

128.208.6.200 - - [11/Jun/2006:18:27:27 -0400] "GET /robots.txtHTTP/1.0" 200 262 "" "NutchCVS/0.8-dev (Nutch running at UW;http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"128.208.6.200 - - [11/Jun/2006:18:27:28 -0400] "GET /post.php HTTP/1.0"200 25000 "" "NutchCVS/0.8-dev (Nutch running at UW;http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"128.208.6.200 - - [11/Jun/2006:18:27:33 -0400] "GET / HTTP/1.0" 20025000 "" "NutchCVS/0.8-dev (Nutch running at UW;http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"128.208.6.200 - - [11/Jun/2006:18:27:38 -0400] "GET /r/post/ HTTP/1.0"200 25000 "" "NutchCVS/0.8-dev (Nutch running at UW;http://www.nutch.org/docs/en/bot.html; [EMAIL PROTECTED])"

Please note that http://swisspig.net/post.php andhttp://swisspig.net/r/post/ are scripts (the same script actually -- Irecently migrated from the format "/post.php?id=foo" to "/r/post/foo")that are not meant to be accessed directly. There are of course nolinks from http://swisspig.net/ to these URLs.



Regards,
Brian Ziman
webmaster, swisspig.net

decomposing URLs issue

Reply via email to