Sure, but if they just went breadth-first (putting pages to crawl into the tail 
of a work queue that spans hundreds of sites), then there wouldn't be a spike 
at all.

On Sep 13, 2011, at 10:55 AM, Tim wrote:

> 
> Google webmaster tools 
> 
>   https://www.google.com/webmasters/tools/home
> 
> lets you (amongst other things) submit sitemaps and see the crawl rate for 
> your site (for the previous 90 days). There's also a form to report problems 
> with how googlebot is accessing your site
> 
>   https://www.google.com/webmasters/tools/googlebot-report
> 
> The crawl rate is modified to try to avoid overloading your site, but given 
> that GAE will just fire up more instances, then I guess googlebot thinks your 
> site is built for such traffic and just keeps upping the crawl rate. You 
> could try and mimic a site being killed by the crawler.... keep basic stats 
> in memcache every time you get hit by googlebot (as idenified by request 
> headers) and if the requests come too thick and fast, delay the responses, or 
> simply return a 408 or maybe a 503 or 509 response, and my guess is you'll 
> see the crawl rate back off pretty quickly.
> 
>   http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
> 
> Would be nice if robots.txt or sitemap files let you specify a maximum crawl 
> rate (cf RSS files), or perhaps people agreed on an HTTP status code for 
> "we're close, but not THAT close..." response to tell crawlers to back off 
> (418 perhaps:) but I don't expect those standards have moved very much 
> recently...
> 
> --
> T
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To view this discussion on the web visit 
> https://groups.google.com/d/msg/google-appengine/-/92F2o_-16zMJ.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to