On Wed, 15 Dec 1999, Boris Goldowsky wrote:
>
> I've been asked to report page-view statistics for our web site
> which eliminate page views from search-engine spiders and other robots.
>
> I've tried to do some of this by coming up with a list of User-Agent
> strings that look like spiders, but it seems like a hit-or-miss sort
> of approach.
>
> Does anyone know if there are standard lists of spider vs. interactive
> User-Agents, or is there a better method of filtering out spider page
> views? Any config-file snippets people would be willing to share?
>
Of course, some things are obviously spiders. (To humans: generating the
complete list might be hard). But I think there are also some spiders which
have a plain Mozilla user-agent. One could only spot these by the speed of
requests over a period of tens of minutes or hours and filter them out
manually. I don't know of a good way to do this automatically.
--
Stephen Turner [EMAIL PROTECTED] http://www.statslab.cam.ac.uk/~sret1/
Statistical Laboratory, 16 Mill Lane, Cambridge CB2 1SB, England
"As always, it's considered good practice to temporarily disable any
virus detection software prior to installing new software." (Netscape)
------------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/[email protected]/
------------------------------------------------------------------------