The Analog docs (http://www.analog.cx/docs/include.html#incregexp)
indicate that regular expressions can be used for inclusions and exclusions
by prefixing the expression with "REGEXP:" or "REGEXPI:".  The use of
regular expressions is covered in detail in the Aliases sections of the docs
(http://www.analog.cx/docs/alias.html#aliasregexp).  Note that a regular
expression must be on a line on its own, not within a comma-separated list.

There is a portion of the Analog site devoted to helper applications
(http://www.analog.cx/helpers/#conffiles).  These are apps written
by Analog users who share their work with the Analog community.
Of particular interest might be Jeremy Wadsack's ROBOTINCLUDE
commands and Israel Honukoglu's SEARCHENGINE commands.
Though their focus was on inclusion, the information that they offer
can be used for exclusion as well.

Hope that helps,

-- Duke


Brad Bull wrote:


Ok, but since the referer field seems to be unreliable for these bots,
wouldn't that pretty much mean developing a list of IP addresses and ranges
for each Robot and building-out a long config HOSTEXCLUDE config section
based on that? REGEXP is not available on HOSTEXCLUDE to parse the User
Agent for the known identifiers, correct?

There are some sites on the net which detail the best guesses on IPs and IP
ranges for the major search engines. It would be a long section of the
config like I said, but very much do-able. I just want to be sure I'm not
missing something. Thanks again.

Brad




-----Original Message-----
From: Duke Hillard [mailto:[EMAIL PROTECTED] Sent: Saturday, June 14, 2003 1:01 PM
To: [EMAIL PROTECTED]
Subject: Re: [analog-help] Excluding Robots - General Summary & Request Report



See "http://www.analog.cx/docs/include.html";. The first section (begins "After aliasing ..." and ends "... see below") explains how to exclude all requests from a computer.

HOSTEXCLUDE host.domain.tld

HTH,

-- Duke


Brad Bull wrote:




Help... Just to make sure I'm getting this right... the ROBOTINCLUDE config and the long list of UAs available for download are for the Operating System Report. Unless otherwise filtered, "Robot"

log entries

are included as Requests and possibly Pages. Correct?

What's the best way to remove known Robot requests from analysis entirely? I don't want to see them in the General Summary as

Successful

Requests or in the Request Report.

Do I need to recreate that whole ROBOTINCLUDE list as a FILEEXCLUDE, HOSTEXLUDE, or BROWEXCLUDE Which one? Isn't there a

FILEEXCLUDE Robots

config or something similar?

Sorry, I'm new to Analog, and the Robots thing is a bit

confusing. The

number of successful requests is an important benchmark for me and I really want to make that stat as close to "human only" as

possible. Any

help would be much appreciated.



+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------






+------------------------------------------------------------------------
|  TO UNSUBSCRIBE from this list:
|    http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+------------------------------------------------------------------------

Reply via email to