After reading this, I did a search on Google using the advanced section and listing all sites on one of my domains. They have listed URLs that I have explicitly blocked in my robots.txt file.
It appears that Googlebot is not a well behaved Crawler. Fred ----- Original Message ----- From: "Klaus Johannes Rusch" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; "Internet robots, spiders, web-walkers, etc." <robots@mccmedia.com> Sent: Friday, August 26, 2005 6:09 PM Subject: Re: [Robots] Googlebot complaint (anyone from Google reading?) > [EMAIL PROTECTED] wrote: > > >Over at Simpy.com I've been watching Googlebot for weeks now, hitting > >the same (invalid) feed URLs over and over and over and over and > >getting a 500 error response every time: > > > >66.249.66.241 - - [26/Aug/2005:13:40:31 -0400] "GET > >/simpy/LinksFeed.do?type=3amp;op=rssamp;version=2.0 HTTP/1.1" 500 6370 > >"-" "Mozilla/5.0 (compatible; Googlebot/2.1; > >+http://www.google.com/bot.html)" > > > >If there is anyone from Google on this list, could you please remove > >this URL (and a few other similar ones from simpy.com with the same > >problem) from Googlebot's queue? > > > > > A couple of things you could try: > > * Try blocking the URL path in your robots.txt file (specificly for > Googlebot if you like) and submit the URL for quick removal (see > http://www.google.com/webmasters/remove.html for details). > * Identify if there are any inbound links from other sites, the > Googlebot even if you remove the content may come back if other sites > have links to this URL (which looks like a misprinted version of > http://www.simpy.com/simpy/BookmarksFeeds.do?type=3&op=rss&version=2.0) > * The HTTP response is rather unusual with a very long error message, > this might confuse some bots, try replacing it with 500 Server error or > even better a 400 Bad request or 404 Not found, instead of 500 > javax%2Eservlet%2Ejsp%2EJspException%3A+An+error+occurred+while+evaluating+c ustom+action+attribute+%22test%22+with+value+%22%24%7Bparam%2Etype+%3D%3D+3% 7D%22%3A+An+exception+occured+trying+to > +convert+String+%223amp%3Boprssamp%3Bversion2%2E0%22+to+type+%22java%2Elang% 2ELong%22+%28null%29, > or redirect to a valid page with a 301 response. > > -- > Klaus Johannes Rusch > [EMAIL PROTECTED] > http://www.atmedia.net/KlausRusch/ > > _______________________________________________ > Robots mailing list > Robots@mccmedia.com > http://www.mccmedia.com/mailman/listinfo/robots _______________________________________________ Robots mailing list Robots@mccmedia.com http://www.mccmedia.com/mailman/listinfo/robots