On Tuesday, 13 September 2011 16:04:41 UTC+1, Joshua Smith wrote:
>
> Sure, but if they just went breadth-first (putting pages to crawl into the 
> tail of a work queue that spans hundreds of sites), then there wouldn't be a 
> spike at all.
>
>
I expect there's something about wanting to pull back a series of pages from 
a single site together to get a consistent series of pages (especially with 
session cookies and sessions encoded in URLs and the like) not to mention 
little things like HTTP pipelining requests and the internal management of 
assigning machines (including timeouts, failovers and retries), updating 
databases with results and meta-results and 101 other things that I can't 
even start to think about - not to say it can't be done, but I think it'd 
have a lot of hidden implications.

Still, you did say "dunno if it's practical" - I was just wondering about 
other ways to make googlebot more compatible with GAE and GAE like systems.

--
T

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/RFQj3mK4mzwJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to