Alan Perkins writes:
 > What's the current accepted practice for hit rate?

In general, leave an interval several times longer than the time
taken for the last response. e.g. if a site responds in 20 ms,
you can hit it again the same second. If a site takes 4 seconds
to response, leave it at least 30 seconds before trying again.

 > B) The number of robots you are running (e.g. 30 seconds per site per
 > robot, or 30 seconds per site across all your robots?)

Generally, take into account all your robots. If you use a mercator
style distribution strategy, this is a non-issue.

 > D) Some other factor (e.g. server response time, etc.)

Server response time is the biggest factor.

 > E) None of the above (i.e. anything goes)
 > 
 > It's clear from the log files I study that some of the big players are
 > not sticking to 30 seconds.  There are good reasons for this and I
 > consider it a good thing (in moderation).  E.g. retrieving one page from
 > a site every 30 seconds only allows 2880 pages per day to be retrieved
 > from a site and this has obvious "freshness" implications when indexing
 > large sites.

Many large sites are split across several servers. Often these can be
hit in parallel - if your robot is clever enough.

Richard
_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to