Re: [analog-help] robots, including, excluding, and so on

2001-01-30 Thread Stephen Turner

On Mon, 29 Jan 2001, Dennis Nichols wrote:

 Greetings - First, my understanding of ROBOTINCLUDE and ROBOTEXCLUDE is 
 that these are report-level commands, that is, they affect only the 
 Operating System Report. Right?
 

Right. At the moment.

 A previous exchange on this list...
 
 On Thu, 25 Jan 2001, Stephen Turner wrote:
   On Wed, 24 Jan 2001, Aaron Shoblaske wrote:
   In the 4.90beta, is there any way in one line to exclude from the entire
   report all robots that you've defined using robotinclude (eg. ROBOT none)
   or do you still have to type in a bunch of seperate BROWEXCLUDEs for each
   robot (eg. BROWEXCLUDE inktomi*, etc.)? And if there isn't would it be
   hard to implement it in the beta?
 
   There isn't. It's a good idea though. I'll do it. Probably.
 
 The above is a good step but something about it seems odd - one says which 
 browsers to include in the Operating System Report as being robots, and 
 then one says exclude all such items from the entire report.
 

Well, I guess the point is that Aaron's suggestion would break the idea of
ROBOT*CLUDE being a report-level command. So then it wouldn't really be
contradictory.

 How about this instead/in addition:
 
 For a selected set of commands, invent a syntax extension that says read 
 the arguments for this command from a file. I could then, for example, put 
 a list of robotish browsers in a file and use any of the following:
 
 ROBOTINCLUDE -FILE filename
 ROBOTEXCLUDE -FILE filename
 BROWEXCLUDE -FILE filename
 BROWINCLUDE -FILE filename
 
 This differs from CONFIGFILE because only the arguments would be in the 
 file, not the commands. This could be generalized to many other commands 
 but it is only really useful where you want to use the same list of 
 arguments for different commands. I think only the item include/exclude 
 commands would get used this way.
 
 Does this make sense?
 

Yes, it makes sense. I'm not sure whether I like it as an idea though. It's
concise, but it possibly seems like too much of a "power user" option, in
that makes it harder to look in one place and figure out what's going on.

Does anyone else have an opinion on this?

-- 
Stephen Turner   http://www.statslab.cam.ac.uk/~sret1/
  Statistical Laboratory, Wilberforce Road, Cambridge, CB3 0WB, England
  "Your account can only be used for a single internet session at any one
   time and for no more than 24 hours in any one day." (NTL terms of use)


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] robots, including, excluding, and so on

2001-01-30 Thread Massimo Mezzini

On 30 Jan 2001, at 11:53, Stephen Turner wrote about 
Re: [analog-help] robots, including, excluding, a:


  ROBOTINCLUDE -FILE filename
  ROBOTEXCLUDE -FILE filename
  BROWEXCLUDE -FILE filename
  BROWINCLUDE -FILE filename

  This differs from CONFIGFILE because only the arguments would be in
  the file, not the commands. This could be generalized to many other
  commands but it is only really useful where you want to use the same
  list of arguments for different commands. I think only the item
  include/exclude commands would get used this way.

  Does this make sense?


 Yes, it makes sense. I'm not sure whether I like it as an idea though.
 It's concise, but it possibly seems like too much of a "power user"
 option, in that makes it harder to look in one place and figure out
 what's going on.
 
 Does anyone else have an opinion on this?

my vote is YES - let's do it, please. 

I intercept new spiders to exclude every week, and it would be a pain 
to modify the config files for all the websites I manage. So I'm 
already using a system like this - simply adding a 
CONFIGFILE no-spiders.txt
line to every main cfg file

And no-spiders.txt contains a list of
HOSTEXCLUDE
which gets constantly updated. I'm not a poweruser at all, but 
something like this really lets me "look in one place and figure out
what's going on", as you said, Stephen.

my 2 lire


Massimo

This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] robots, including, excluding, and so on

2001-01-30 Thread Klaus Johannes Rusch

In [EMAIL PROTECTED], Stephen 
Turner [EMAIL PROTECTED] writes:
 Well, I guess the point is that Aaron's suggestion would break the idea of
 ROBOT*CLUDE being a report-level command. So then it wouldn't really be
 contradictory.

At the risk of breaking backward compatibility I would prefer to see a clear 
distinction in the name between report-level commands and processing-level
commands, i.e. ROBOT*CLUDE vs ROBOTREP*CLUDE

  ROBOTINCLUDE -FILE filename
  ROBOTEXCLUDE -FILE filename
  BROWEXCLUDE -FILE filename
  BROWINCLUDE -FILE filename
 
  This differs from CONFIGFILE because only the arguments would be in the
  file, not the commands. This could be generalized to many other commands
  but it is only really useful where you want to use the same list of
  arguments for different commands. I think only the item include/exclude
  commands would get used this way.
 
 Yes, it makes sense. I'm not sure whether I like it as an idea though. It's
 concise, but it possibly seems like too much of a "power user" option, in
 that makes it harder to look in one place and figure out what's going on.

Converting a list of arguments to a configuration file is easy, e.g.
perl -n -p -e"s/^/ROBOTEXCLUDE /" list, so not sure if another 
configuration file format is really required (also would -FILE allow for regexs
or not?)

-- 
Klaus Johannes Rusch
[EMAIL PROTECTED]
http://www.atmedia.net/KlausRusch/

This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] robots, including, excluding, and so on

2001-01-30 Thread Jeremy Wadsack

Stephen Turner wrote:

 How about this instead/in addition:
 
 For a selected set of commands, invent a syntax extension that says read 
 the arguments for this command from a file. I could then, for example, put 
 a list of robotish browsers in a file and use any of the following:
 
 ROBOTINCLUDE -FILE filename
 ROBOTEXCLUDE -FILE filename
 BROWEXCLUDE -FILE filename
 BROWINCLUDE -FILE filename
 
 This differs from CONFIGFILE because only the arguments would be in the 
 file, not the commands. This could be generalized to many other commands 
 but it is only really useful where you want to use the same list of 
 arguments for different commands. I think only the item include/exclude 
 commands would get used this way.
 
 Does this make sense?
 
 
 Yes, it makes sense. I'm not sure whether I like it as an idea though. It's
 concise, but it possibly seems like too much of a "power user" option, in
 that makes it harder to look in one place and figure out what's going on.
 
 Does anyone else have an opinion on this?

I think from a support point of view this confuses the configuration 
syntax. Keeping to the same format will make it easier to find problems 
and for users to know what a file does. When you look at the 
no-robots.txt file on the system that you just too over administration 
of, you have to work backwards to find out if it's used as a BROWSER* or 
ROBOT* command, or both. If the file contained the command in the first 
place you'd know how they were used.

Finally, as Klaus said separately, it's simple (if you have Perl or sed 
or awk or something) to create a settings file from the described file 
above. If not you can create a BROW*CLUDE file and use a search and 
replace in your favorite text editors to change to a ROBOT*CLUDE.



-- 

Jeremy Wadsack
Wadsack-Allen Digital Group


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] commandline parameters

2001-01-30 Thread Rainer Fuegenstein

Jeremy Wadsack wrote:
 
 No it's a shell quoting problem. Try this variant:
 
 SERVERNAME=+C\"HOSTNAME $SITE\"
 SERVERURL=+C\"BASEURL http://$SITE\"

now that I see it it looks obvious, but now I get:

./proclogs.sh: www.example.at": command not found
./proclogs.sh: http://www.example.at": No such file or directory

and $SERVERNAME and $SERVERURL are empty.

This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




[analog-help] Grouping URLs

2001-01-30 Thread CAPRON Patrick
Title: Grouping URLs





Is there a way to group URL?
For example : /direct/ and /direct/my_cgi.cgi represent the
same URL, because my_cgi.cgi is linked to by default.cgi
in /direct/.


Thank you all for your needfull help...



Patrick CAPRON
03.88.14.85.49
[EMAIL PROTECTED]






Re: [analog-help] commandline parameters

2001-01-30 Thread Rainer Fuegenstein

Jeremy Wadsack wrote:
 
 SERVERNAME=+C\"HOSTNAME $SITE\"
 SERVERURL=+C\"BASEURL http://$SITE\"

Sorry for spamming this list with wrong guesses. I had another idea, but
this one also doesn't work:

 SERVERNAME="+C\"HOSTNAME $SITE\""  
 SERVERURL="+C\"BASEURL http://$SITE\""  


results in:

+C"HOSTNAME www.example.at"
+C"BASEURL http://www.example.at"

/opt/analog/analog: analog version 4.13/Unix
/opt/analog/analog: Warning C: Unknown configuration command: ignoring
it:
  "HOSTNAME
/opt/analog/analog: Warning C: Unknown configuration command: ignoring
it:
  "BASEURL
/opt/analog/analog: Warning F: Failed to open logfile www.example.at":
ignoring
  it
/opt/analog/analog: Warning F: Failed to open logfile
http://www.example.at":
  ignoring it

This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




[analog-help] UNSUBSCRIBE

2001-01-30 Thread Richard Norton
Title: Grouping URLs



UNSUBSCRIBE


Re: [analog-help] Grouping URLs

2001-01-30 Thread Jeremy Wadsack

CAPRON Patrick wrote:

 Is there a way to group URL?
 For example : /direct/ and /direct/my_cgi.cgi represent the
 same URL, because my_cgi.cgi is linked to by default.cgi
 in /direct/.
 
FILEALIAS /direct/ /direct/my_cgi.cgi

-- 

Jeremy Wadsack
Wadsack-Allen Digital Group


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] commandline parameters

2001-01-30 Thread Jeremy Wadsack

Rainer Fuegenstein wrote:

 Sorry for spamming this list with wrong guesses. I had another idea, but
 this one also doesn't work:
 
  SERVERNAME="+C\"HOSTNAME $SITE\""  
  SERVERURL="+C\"BASEURL http://$SITE\""  


I just realized that Analog should support non-quote delimiters. Try this:

#!/bin/sh

SITE="www.domain.com"
LOGFILE="$SITE.log"
OUTFILE="$SITE.html"

SERVERNAME="+C(HOSTNAME $SITE)"
SERVERURL="+C(BASEURL http://$SITE)"

/opt/analog/analog $SERVERNAME $SERVERURL $LOGFILE \ $OUTFILE

-- 

Jeremy Wadsack
Wadsack-Allen Digital Group


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




[analog-help] Segmentation fault with Analog 4.14 and 4.90beta 1 on Solaris 8

2001-01-30 Thread Angus Rae

Hi,

I'm having a problem compiling up Analog, both 4.14 and 4.90beta1 - they
both compile and run but core dump. For example;

bash# ./analog --help
This is analog version 4.14/Unix
For help see docs/Readme.html, or http://www.analog.cx/
Segmentation Fault (core dumped)

bash# ./analog --help
This is analog version 4.90beta1/Unix
For help see docs/Readme.html, or http://www.analog.cx/
Segmentation Fault (core dumped)

The crash is also produced if trying to actually output some data;

bash# ./analog| more
./analog: analog version 4.90beta1/Unix
./analog: Warning D: Turning all pie charts off because OUTFILE is stdout
  (For help on all errors and warnings, see docs/errors.html)
!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"
... (gets down to Request Report) ...
  23:  0.22%: 30/Jan/01 18:31: /bto/images/eagle6.gif
  23:  0.
Segmentation Fault (core dumped)

The system is a Sun running Solaris 2.8 with latest patches, and the
compiler is gcc version 2.95.2 19991024 (release) downloaded from
www.sunfreeware.com. During compilation of 4.14 the options LIBS= -lnsl
and -DNEED_STRCMP were used, and for 4.90beta1 LIBS= -lnsl -lm and
-DNEED_STRCMP. (I originally tried without -DNEED_STRCMP and got the
crashes, then put it in to see if it helped)

Any ideas what's going wrong? Is it possibly a gcc problem?

Thanks in advance,
Angus
-- 
Angus G Rae  Computing Services
Science  Engineering Support Team  University of Edinburgh
The above opinions are mine, and Edinburgh Uni can't have them.

This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] commandline parameters

2001-01-30 Thread Rainer Fuegenstein

Jeremy Wadsack wrote:

 
 I just realized that Analog should support non-quote delimiters. Try this:
 
 SERVERNAME="+C(HOSTNAME $SITE)"
 SERVERURL="+C(BASEURL http://$SITE)"

well 

+C(HOSTNAME www.example.at)
+C(BASEURL http://www.example.at)

/opt/analog/analog: analog version 4.13/Unix
/opt/analog/analog: Warning C: Unknown configuration command: ignoring
it:
  (HOSTNAME
/opt/analog/analog: Warning C: Unknown configuration command: ignoring
it:
  (BASEURL
/opt/analog/analog: Warning F: Failed to open logfile www.example.at):
ignoring
  it
/opt/analog/analog: Warning F: Failed to open logfile
http://www.example.at):
  ignoring it

This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




[analog-help] Initial set-up

2001-01-30 Thread country_directory

From Robbie at CD

I am having a problem with setting the Analog format.  I think it is because
I have set the log format wrongly.
The error message is "failed to open "logfile logs ignoring it"
The server supports IIS 5.0 version 1 and downloads into the local directory
on the FTP client as logs (which opens as W3svc203 which opens as
ex010129.log)  I have read the Analog documentation including the debugging
notes but I am still not certain of the LOGFORMAT to use.  For LOGFILE I am
using "logs" for (the OUTFILE seems to be OK)

I have tried many combinations of log format and log file names but now
appeal to a higher and wiser authority for guidance

Robbie


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




Re: [analog-help] robots, including, excluding, and so on

2001-01-30 Thread Dennis Nichols

At 1/30/01 01:56 PM, Klaus Johannes Rusch wrote:
In [EMAIL PROTECTED], 
Stephen Turner [EMAIL PROTECTED] writes:
   ROBOTINCLUDE -FILE filename
   ROBOTEXCLUDE -FILE filename
   BROWEXCLUDE -FILE filename
   BROWINCLUDE -FILE filename
 
  Yes, it makes sense. I'm not sure whether I like it as an idea though.

Converting a list of arguments to a configuration file is easy, e.g.
perl -n -p -e"s/^/ROBOTEXCLUDE /" list, so not sure if another
configuration file format is really required (also would -FILE allow for 
regexs
or not?)

I'm the proposer of the -FILE stuff above. Having seen Klaus' suggestion of 
generating the config file(s) from an argument list, I retract my proposal. 
I like his way just fine.



--
Dennis Nichols
[EMAIL PROTECTED]


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/




[analog-help] Analog LOGFORMAT for IIS

2001-01-30 Thread H. Carter Harris

I am running IIS 4 and creating logfiles for my website.  I want to be able
to use Analog to do a little analysis on these files.  IIS is writing the
logfiles in W3C extended format.

Based on what I read in the Analog documentation, I think I need a LOGFORMAT
record to get it to recognize the Microsoft version of W3C.  I tried
LOGFORMAT MS-EXTENDED and I get the following when I run analog:

analog: analog version 4.90beta1/win32
analog Warning C: Ignoring corrupt format line in logfile
analog ...cont..:   reason: time without date or vice versa
  For help on all errors and warnings, see docs/errors.html
analog: Waring L: Large number of currupt lines in logfile ex010129.log: try
different LOGFORMAT
Current logfile format:
#Fields:\n
#%j\n
analog: Warning R: Turning off empty time reports ...

My logfile looks like this:

#Software: Microsoft Internet Information Server 4.0
#Version: 1.0
#Date: 2001-01-29 15:46:54
#Fields: time c-ip cs-method cs-uri-stem sc-status
15:46:54 216.78.145.215 GET /Default.asp 302
15:46:54 216.78.145.215 GET /login.asp 200
15:46:54 216.78.145.215 GET /technettn.css 200
15:46:54 216.78.145.215 GET /i/bgc.gif 200
15:47:04 216.78.145.215 POST /default.asp 200 ...

Even though I'm specifying the MS-EXTENDED logformat, do I need to put in
something else to identify the file format?  I edited the analog.cfg file
what was part of the download and changed the file name and added the
LOGFORMAT record.  It now looks like this (after the comments):

LOGFORMAT MS-EXTENDED
LOGFILE ex010129.log# to set where your logfile lives
# LOGFILE logfile.log
# OUTFILE outfile.html
HOSTNAME "[TechNet of Tennessee, Inc.]"
# REQINCLUDE pages
REQLINKINCLUDE pages ... and so on.

Any help would be greatly appreciated. - Carter


This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/