ikong fu wrote:

>I have heard of an algorithm of 10 requests within 10 minutes from the
>same IP being used to flag potential spiders. However, that still 
>leaves potential proxy server requests.

IE allows you to capture a page or sequence of pages for off-line use 
that would easily fall afoul of this rule, and I reqularly skim through 
news sites at a faster pace than that. 

10 pages in 10 seconds might be indicative of a spider, but how would 
you gather that information anyway, without reprocessing your logfile.

But we should differentiate between ordinary spiders that you want to 
exclude from your visitor statistics (which are of questionable value 
anyway) and "stealth" spiders that want to spider your whole site 
without you knowing about it. (If you're worried about this, you 
probably shouldn't be putting your stuff on the web in the first 
place). Ordinary spiders shouldn't be that hard to spot, but you will 
have to check your filters fairly often, because there's no guarantee 
that there won't be new spiders, or that the old spiders won't "evolve".

ikong fu
[EMAIL PROTECTED]

Aengus
------------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/[email protected]/
------------------------------------------------------------------------

Reply via email to