I have heard of an algorithm of 10 requests within 10 minutes from the
same IP being used to flag potential spiders. However, that still leaves
potential proxy server requests.
ikong fu
[EMAIL PROTECTED]
Stephen Turner wrote:
>
> On Wed, 15 Dec 1999, Boris Goldowsky wrote:
>
> >
> > I've been asked to report page-view statistics for our web site
> > which eliminate page views from search-engine spiders and other robots.
> >
> > I've tried to do some of this by coming up with a list of User-Agent
> > strings that look like spiders, but it seems like a hit-or-miss sort
> > of approach.
> >
> > Does anyone know if there are standard lists of spider vs. interactive
> > User-Agents, or is there a better method of filtering out spider page
> > views? Any config-file snippets people would be willing to share?
> >
>
> Of course, some things are obviously spiders. (To humans: generating the
> complete list might be hard). But I think there are also some spiders which
> have a plain Mozilla user-agent. One could only spot these by the speed of
> requests over a period of tens of minutes or hours and filter them out
> manually. I don't know of a good way to do this automatically.
>
> --
> Stephen Turner [EMAIL PROTECTED] http://www.statslab.cam.ac.uk/~sret1/
> Statistical Laboratory, 16 Mill Lane, Cambridge CB2 1SB, England
> "As always, it's considered good practice to temporarily disable any
> virus detection software prior to installing new software." (Netscape)
>
> ------------------------------------------------------------------------
> This is the analog-help mailing list. To unsubscribe from this
> mailing list, send mail to [EMAIL PROTECTED]
> with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
> List archived at http://www.mail-archive.com/[email protected]/
> ------------------------------------------------------------------------
------------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/[email protected]/
------------------------------------------------------------------------