Sure, but if they just went breadth-first (putting pages to crawl into the tail of a work queue that spans hundreds of sites), then there wouldn't be a spike at all.
On Sep 13, 2011, at 10:55 AM, Tim wrote: > > Google webmaster tools > > https://www.google.com/webmasters/tools/home > > lets you (amongst other things) submit sitemaps and see the crawl rate for > your site (for the previous 90 days). There's also a form to report problems > with how googlebot is accessing your site > > https://www.google.com/webmasters/tools/googlebot-report > > The crawl rate is modified to try to avoid overloading your site, but given > that GAE will just fire up more instances, then I guess googlebot thinks your > site is built for such traffic and just keeps upping the crawl rate. You > could try and mimic a site being killed by the crawler.... keep basic stats > in memcache every time you get hit by googlebot (as idenified by request > headers) and if the requests come too thick and fast, delay the responses, or > simply return a 408 or maybe a 503 or 509 response, and my guess is you'll > see the crawl rate back off pretty quickly. > > http://en.wikipedia.org/wiki/List_of_HTTP_status_codes > > Would be nice if robots.txt or sitemap files let you specify a maximum crawl > rate (cf RSS files), or perhaps people agreed on an HTTP status code for > "we're close, but not THAT close..." response to tell crawlers to back off > (418 perhaps:) but I don't expect those standards have moved very much > recently... > > -- > T > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/google-appengine/-/92F2o_-16zMJ. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
