That's all good info, but it doesn't apply if you are on GAE. If you are on GAE you can't specify your crawl rate. It is assigned a special Crawl rate.
From: [email protected] [mailto:[email protected]] On Behalf Of Tim Sent: Tuesday, September 13, 2011 7:55 AM To: [email protected] Subject: [google-appengine] Re: Something to pass along to the google search team Google webmaster tools https://www.google.com/webmasters/tools/home lets you (amongst other things) submit sitemaps and see the crawl rate for your site (for the previous 90 days). There's also a form to report problems with how googlebot is accessing your site https://www.google.com/webmasters/tools/googlebot-report The crawl rate is modified to try to avoid overloading your site, but given that GAE will just fire up more instances, then I guess googlebot thinks your site is built for such traffic and just keeps upping the crawl rate. You could try and mimic a site being killed by the crawler.... keep basic stats in memcache every time you get hit by googlebot (as idenified by request headers) and if the requests come too thick and fast, delay the responses, or simply return a 408 or maybe a 503 or 509 response, and my guess is you'll see the crawl rate back off pretty quickly. http://en.wikipedia.org/wiki/List_of_HTTP_status_codes Would be nice if robots.txt or sitemap files let you specify a maximum crawl rate (cf RSS files), or perhaps people agreed on an HTTP status code for "we're close, but not THAT close..." response to tell crawlers to back off (418 perhaps:) but I don't expect those standards have moved very much recently... -- T -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/92F2o_-16zMJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
