Alan Perkins writes: > What's the current accepted practice for hit rate?
In general, leave an interval several times longer than the time taken for the last response. e.g. if a site responds in 20 ms, you can hit it again the same second. If a site takes 4 seconds to response, leave it at least 30 seconds before trying again. > B) The number of robots you are running (e.g. 30 seconds per site per > robot, or 30 seconds per site across all your robots?) Generally, take into account all your robots. If you use a mercator style distribution strategy, this is a non-issue. > D) Some other factor (e.g. server response time, etc.) Server response time is the biggest factor. > E) None of the above (i.e. anything goes) > > It's clear from the log files I study that some of the big players are > not sticking to 30 seconds. There are good reasons for this and I > consider it a good thing (in moderation). E.g. retrieving one page from > a site every 30 seconds only allows 2880 pages per day to be retrieved > from a site and this has obvious "freshness" implications when indexing > large sites. Many large sites are split across several servers. Often these can be hit in parallel - if your robot is clever enough. Richard _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots