That's all good info, but it doesn't apply if you are on GAE. If you are on
GAE you can't specify your crawl rate.  It is assigned a special Crawl rate.

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Tim
Sent: Tuesday, September 13, 2011 7:55 AM
To: [email protected]
Subject: [google-appengine] Re: Something to pass along to the google search
team

 

 

Google webmaster tools 

 

  https://www.google.com/webmasters/tools/home

 

lets you (amongst other things) submit sitemaps and see the crawl rate for
your site (for the previous 90 days). There's also a form to report problems
with how googlebot is accessing your site

 

  https://www.google.com/webmasters/tools/googlebot-report

 

The crawl rate is modified to try to avoid overloading your site, but given
that GAE will just fire up more instances, then I guess googlebot thinks
your site is built for such traffic and just keeps upping the crawl rate.
You could try and mimic a site being killed by the crawler.... keep basic
stats in memcache every time you get hit by googlebot (as idenified by
request headers) and if the requests come too thick and fast, delay the
responses, or simply return a 408 or maybe a 503 or 509 response, and my
guess is you'll see the crawl rate back off pretty quickly.

 

  http://en.wikipedia.org/wiki/List_of_HTTP_status_codes

 

Would be nice if robots.txt or sitemap files let you specify a maximum crawl
rate (cf RSS files), or perhaps people agreed on an HTTP status code for
"we're close, but not THAT close..." response to tell crawlers to back off
(418 perhaps:) but I don't expect those standards have moved very much
recently...

 

--

T

 

-- 
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/google-appengine/-/92F2o_-16zMJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to