The Analog docs (http://www.analog.cx/docs/include.html#incregexp) indicate that regular expressions can be used for inclusions and exclusions by prefixing the expression with "REGEXP:" or "REGEXPI:". The use of regular expressions is covered in detail in the Aliases sections of the docs (http://www.analog.cx/docs/alias.html#aliasregexp). Note that a regular expression must be on a line on its own, not within a comma-separated list.
There is a portion of the Analog site devoted to helper applications (http://www.analog.cx/helpers/#conffiles). These are apps written by Analog users who share their work with the Analog community. Of particular interest might be Jeremy Wadsack's ROBOTINCLUDE commands and Israel Honukoglu's SEARCHENGINE commands. Though their focus was on inclusion, the information that they offer can be used for exclusion as well.
Hope that helps,
-- Duke
Brad Bull wrote:
Ok, but since the referer field seems to be unreliable for these bots, wouldn't that pretty much mean developing a list of IP addresses and ranges for each Robot and building-out a long config HOSTEXCLUDE config section based on that? REGEXP is not available on HOSTEXCLUDE to parse the User Agent for the known identifiers, correct?
There are some sites on the net which detail the best guesses on IPs and IP ranges for the major search engines. It would be a long section of the config like I said, but very much do-able. I just want to be sure I'm not missing something. Thanks again.
Brad
-----Original Message-----
From: Duke Hillard [mailto:[EMAIL PROTECTED] Sent: Saturday, June 14, 2003 1:01 PM
To: [EMAIL PROTECTED]
Subject: Re: [analog-help] Excluding Robots - General Summary & Request Report
See "http://www.analog.cx/docs/include.html". The first section (begins "After aliasing ..." and ends "... see below") explains how to exclude all requests from a computer.
HOSTEXCLUDE host.domain.tld
HTH,
-- Duke
Brad Bull wrote:
Help... Just to make sure I'm getting this right... the ROBOTINCLUDE config and the long list of UAs available for download are for the Operating System Report. Unless otherwise filtered, "Robot"log entries
are included as Requests and possibly Pages. Correct?Successful
What's the best way to remove known Robot requests from analysis entirely? I don't want to see them in the General Summary as
Requests or in the Request Report.FILEEXCLUDE Robots
Do I need to recreate that whole ROBOTINCLUDE list as a FILEEXCLUDE, HOSTEXLUDE, or BROWEXCLUDE Which one? Isn't there a
config or something similar?confusing. The
Sorry, I'm new to Analog, and the Robots thing is a bit
number of successful requests is an important benchmark for me and I really want to make that stat as close to "human only" aspossible. Any
help would be much appreciated.
+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------
+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------
