Re: [analog-help] robots, including, excluding, and so on
On Mon, 29 Jan 2001, Dennis Nichols wrote: Greetings - First, my understanding of ROBOTINCLUDE and ROBOTEXCLUDE is that these are report-level commands, that is, they affect only the Operating System Report. Right? Right. At the moment. A previous exchange on this list... On Thu, 25 Jan 2001, Stephen Turner wrote: On Wed, 24 Jan 2001, Aaron Shoblaske wrote: In the 4.90beta, is there any way in one line to exclude from the entire report all robots that you've defined using robotinclude (eg. ROBOT none) or do you still have to type in a bunch of seperate BROWEXCLUDEs for each robot (eg. BROWEXCLUDE inktomi*, etc.)? And if there isn't would it be hard to implement it in the beta? There isn't. It's a good idea though. I'll do it. Probably. The above is a good step but something about it seems odd - one says which browsers to include in the Operating System Report as being robots, and then one says exclude all such items from the entire report. Well, I guess the point is that Aaron's suggestion would break the idea of ROBOT*CLUDE being a report-level command. So then it wouldn't really be contradictory. How about this instead/in addition: For a selected set of commands, invent a syntax extension that says read the arguments for this command from a file. I could then, for example, put a list of robotish browsers in a file and use any of the following: ROBOTINCLUDE -FILE filename ROBOTEXCLUDE -FILE filename BROWEXCLUDE -FILE filename BROWINCLUDE -FILE filename This differs from CONFIGFILE because only the arguments would be in the file, not the commands. This could be generalized to many other commands but it is only really useful where you want to use the same list of arguments for different commands. I think only the item include/exclude commands would get used this way. Does this make sense? Yes, it makes sense. I'm not sure whether I like it as an idea though. It's concise, but it possibly seems like too much of a "power user" option, in that makes it harder to look in one place and figure out what's going on. Does anyone else have an opinion on this? -- Stephen Turner http://www.statslab.cam.ac.uk/~sret1/ Statistical Laboratory, Wilberforce Road, Cambridge, CB3 0WB, England "Your account can only be used for a single internet session at any one time and for no more than 24 hours in any one day." (NTL terms of use) This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
Re: [analog-help] robots, including, excluding, and so on
On 30 Jan 2001, at 11:53, Stephen Turner wrote about Re: [analog-help] robots, including, excluding, a: ROBOTINCLUDE -FILE filename ROBOTEXCLUDE -FILE filename BROWEXCLUDE -FILE filename BROWINCLUDE -FILE filename This differs from CONFIGFILE because only the arguments would be in the file, not the commands. This could be generalized to many other commands but it is only really useful where you want to use the same list of arguments for different commands. I think only the item include/exclude commands would get used this way. Does this make sense? Yes, it makes sense. I'm not sure whether I like it as an idea though. It's concise, but it possibly seems like too much of a "power user" option, in that makes it harder to look in one place and figure out what's going on. Does anyone else have an opinion on this? my vote is YES - let's do it, please. I intercept new spiders to exclude every week, and it would be a pain to modify the config files for all the websites I manage. So I'm already using a system like this - simply adding a CONFIGFILE no-spiders.txt line to every main cfg file And no-spiders.txt contains a list of HOSTEXCLUDE which gets constantly updated. I'm not a poweruser at all, but something like this really lets me "look in one place and figure out what's going on", as you said, Stephen. my 2 lire Massimo This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
Re: [analog-help] robots, including, excluding, and so on
In [EMAIL PROTECTED], Stephen Turner [EMAIL PROTECTED] writes: Well, I guess the point is that Aaron's suggestion would break the idea of ROBOT*CLUDE being a report-level command. So then it wouldn't really be contradictory. At the risk of breaking backward compatibility I would prefer to see a clear distinction in the name between report-level commands and processing-level commands, i.e. ROBOT*CLUDE vs ROBOTREP*CLUDE ROBOTINCLUDE -FILE filename ROBOTEXCLUDE -FILE filename BROWEXCLUDE -FILE filename BROWINCLUDE -FILE filename This differs from CONFIGFILE because only the arguments would be in the file, not the commands. This could be generalized to many other commands but it is only really useful where you want to use the same list of arguments for different commands. I think only the item include/exclude commands would get used this way. Yes, it makes sense. I'm not sure whether I like it as an idea though. It's concise, but it possibly seems like too much of a "power user" option, in that makes it harder to look in one place and figure out what's going on. Converting a list of arguments to a configuration file is easy, e.g. perl -n -p -e"s/^/ROBOTEXCLUDE /" list, so not sure if another configuration file format is really required (also would -FILE allow for regexs or not?) -- Klaus Johannes Rusch [EMAIL PROTECTED] http://www.atmedia.net/KlausRusch/ This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
Re: [analog-help] robots, including, excluding, and so on
Stephen Turner wrote: How about this instead/in addition: For a selected set of commands, invent a syntax extension that says read the arguments for this command from a file. I could then, for example, put a list of robotish browsers in a file and use any of the following: ROBOTINCLUDE -FILE filename ROBOTEXCLUDE -FILE filename BROWEXCLUDE -FILE filename BROWINCLUDE -FILE filename This differs from CONFIGFILE because only the arguments would be in the file, not the commands. This could be generalized to many other commands but it is only really useful where you want to use the same list of arguments for different commands. I think only the item include/exclude commands would get used this way. Does this make sense? Yes, it makes sense. I'm not sure whether I like it as an idea though. It's concise, but it possibly seems like too much of a "power user" option, in that makes it harder to look in one place and figure out what's going on. Does anyone else have an opinion on this? I think from a support point of view this confuses the configuration syntax. Keeping to the same format will make it easier to find problems and for users to know what a file does. When you look at the no-robots.txt file on the system that you just too over administration of, you have to work backwards to find out if it's used as a BROWSER* or ROBOT* command, or both. If the file contained the command in the first place you'd know how they were used. Finally, as Klaus said separately, it's simple (if you have Perl or sed or awk or something) to create a settings file from the described file above. If not you can create a BROW*CLUDE file and use a search and replace in your favorite text editors to change to a ROBOT*CLUDE. -- Jeremy Wadsack Wadsack-Allen Digital Group This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
Re: [analog-help] commandline parameters
Jeremy Wadsack wrote: No it's a shell quoting problem. Try this variant: SERVERNAME=+C\"HOSTNAME $SITE\" SERVERURL=+C\"BASEURL http://$SITE\" now that I see it it looks obvious, but now I get: ./proclogs.sh: www.example.at": command not found ./proclogs.sh: http://www.example.at": No such file or directory and $SERVERNAME and $SERVERURL are empty. This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
[analog-help] Grouping URLs
Title: Grouping URLs Is there a way to group URL? For example : /direct/ and /direct/my_cgi.cgi represent the same URL, because my_cgi.cgi is linked to by default.cgi in /direct/. Thank you all for your needfull help... Patrick CAPRON 03.88.14.85.49 [EMAIL PROTECTED]
Re: [analog-help] commandline parameters
Jeremy Wadsack wrote: SERVERNAME=+C\"HOSTNAME $SITE\" SERVERURL=+C\"BASEURL http://$SITE\" Sorry for spamming this list with wrong guesses. I had another idea, but this one also doesn't work: SERVERNAME="+C\"HOSTNAME $SITE\"" SERVERURL="+C\"BASEURL http://$SITE\"" results in: +C"HOSTNAME www.example.at" +C"BASEURL http://www.example.at" /opt/analog/analog: analog version 4.13/Unix /opt/analog/analog: Warning C: Unknown configuration command: ignoring it: "HOSTNAME /opt/analog/analog: Warning C: Unknown configuration command: ignoring it: "BASEURL /opt/analog/analog: Warning F: Failed to open logfile www.example.at": ignoring it /opt/analog/analog: Warning F: Failed to open logfile http://www.example.at": ignoring it This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
[analog-help] UNSUBSCRIBE
Title: Grouping URLs UNSUBSCRIBE
Re: [analog-help] Grouping URLs
CAPRON Patrick wrote: Is there a way to group URL? For example : /direct/ and /direct/my_cgi.cgi represent the same URL, because my_cgi.cgi is linked to by default.cgi in /direct/. FILEALIAS /direct/ /direct/my_cgi.cgi -- Jeremy Wadsack Wadsack-Allen Digital Group This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
Re: [analog-help] commandline parameters
Rainer Fuegenstein wrote: Sorry for spamming this list with wrong guesses. I had another idea, but this one also doesn't work: SERVERNAME="+C\"HOSTNAME $SITE\"" SERVERURL="+C\"BASEURL http://$SITE\"" I just realized that Analog should support non-quote delimiters. Try this: #!/bin/sh SITE="www.domain.com" LOGFILE="$SITE.log" OUTFILE="$SITE.html" SERVERNAME="+C(HOSTNAME $SITE)" SERVERURL="+C(BASEURL http://$SITE)" /opt/analog/analog $SERVERNAME $SERVERURL $LOGFILE \ $OUTFILE -- Jeremy Wadsack Wadsack-Allen Digital Group This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
[analog-help] Segmentation fault with Analog 4.14 and 4.90beta 1 on Solaris 8
Hi, I'm having a problem compiling up Analog, both 4.14 and 4.90beta1 - they both compile and run but core dump. For example; bash# ./analog --help This is analog version 4.14/Unix For help see docs/Readme.html, or http://www.analog.cx/ Segmentation Fault (core dumped) bash# ./analog --help This is analog version 4.90beta1/Unix For help see docs/Readme.html, or http://www.analog.cx/ Segmentation Fault (core dumped) The crash is also produced if trying to actually output some data; bash# ./analog| more ./analog: analog version 4.90beta1/Unix ./analog: Warning D: Turning all pie charts off because OUTFILE is stdout (For help on all errors and warnings, see docs/errors.html) !DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN" ... (gets down to Request Report) ... 23: 0.22%: 30/Jan/01 18:31: /bto/images/eagle6.gif 23: 0. Segmentation Fault (core dumped) The system is a Sun running Solaris 2.8 with latest patches, and the compiler is gcc version 2.95.2 19991024 (release) downloaded from www.sunfreeware.com. During compilation of 4.14 the options LIBS= -lnsl and -DNEED_STRCMP were used, and for 4.90beta1 LIBS= -lnsl -lm and -DNEED_STRCMP. (I originally tried without -DNEED_STRCMP and got the crashes, then put it in to see if it helped) Any ideas what's going wrong? Is it possibly a gcc problem? Thanks in advance, Angus -- Angus G Rae Computing Services Science Engineering Support Team University of Edinburgh The above opinions are mine, and Edinburgh Uni can't have them. This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
Re: [analog-help] commandline parameters
Jeremy Wadsack wrote: I just realized that Analog should support non-quote delimiters. Try this: SERVERNAME="+C(HOSTNAME $SITE)" SERVERURL="+C(BASEURL http://$SITE)" well +C(HOSTNAME www.example.at) +C(BASEURL http://www.example.at) /opt/analog/analog: analog version 4.13/Unix /opt/analog/analog: Warning C: Unknown configuration command: ignoring it: (HOSTNAME /opt/analog/analog: Warning C: Unknown configuration command: ignoring it: (BASEURL /opt/analog/analog: Warning F: Failed to open logfile www.example.at): ignoring it /opt/analog/analog: Warning F: Failed to open logfile http://www.example.at): ignoring it This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
[analog-help] Initial set-up
From Robbie at CD I am having a problem with setting the Analog format. I think it is because I have set the log format wrongly. The error message is "failed to open "logfile logs ignoring it" The server supports IIS 5.0 version 1 and downloads into the local directory on the FTP client as logs (which opens as W3svc203 which opens as ex010129.log) I have read the Analog documentation including the debugging notes but I am still not certain of the LOGFORMAT to use. For LOGFILE I am using "logs" for (the OUTFILE seems to be OK) I have tried many combinations of log format and log file names but now appeal to a higher and wiser authority for guidance Robbie This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
Re: [analog-help] robots, including, excluding, and so on
At 1/30/01 01:56 PM, Klaus Johannes Rusch wrote: In [EMAIL PROTECTED], Stephen Turner [EMAIL PROTECTED] writes: ROBOTINCLUDE -FILE filename ROBOTEXCLUDE -FILE filename BROWEXCLUDE -FILE filename BROWINCLUDE -FILE filename Yes, it makes sense. I'm not sure whether I like it as an idea though. Converting a list of arguments to a configuration file is easy, e.g. perl -n -p -e"s/^/ROBOTEXCLUDE /" list, so not sure if another configuration file format is really required (also would -FILE allow for regexs or not?) I'm the proposer of the -FILE stuff above. Having seen Klaus' suggestion of generating the config file(s) from an argument list, I retract my proposal. I like his way just fine. -- Dennis Nichols [EMAIL PROTECTED] This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
[analog-help] Analog LOGFORMAT for IIS
I am running IIS 4 and creating logfiles for my website. I want to be able to use Analog to do a little analysis on these files. IIS is writing the logfiles in W3C extended format. Based on what I read in the Analog documentation, I think I need a LOGFORMAT record to get it to recognize the Microsoft version of W3C. I tried LOGFORMAT MS-EXTENDED and I get the following when I run analog: analog: analog version 4.90beta1/win32 analog Warning C: Ignoring corrupt format line in logfile analog ...cont..: reason: time without date or vice versa For help on all errors and warnings, see docs/errors.html analog: Waring L: Large number of currupt lines in logfile ex010129.log: try different LOGFORMAT Current logfile format: #Fields:\n #%j\n analog: Warning R: Turning off empty time reports ... My logfile looks like this: #Software: Microsoft Internet Information Server 4.0 #Version: 1.0 #Date: 2001-01-29 15:46:54 #Fields: time c-ip cs-method cs-uri-stem sc-status 15:46:54 216.78.145.215 GET /Default.asp 302 15:46:54 216.78.145.215 GET /login.asp 200 15:46:54 216.78.145.215 GET /technettn.css 200 15:46:54 216.78.145.215 GET /i/bgc.gif 200 15:47:04 216.78.145.215 POST /default.asp 200 ... Even though I'm specifying the MS-EXTENDED logformat, do I need to put in something else to identify the file format? I edited the analog.cfg file what was part of the download and changed the file name and added the LOGFORMAT record. It now looks like this (after the comments): LOGFORMAT MS-EXTENDED LOGFILE ex010129.log# to set where your logfile lives # LOGFILE logfile.log # OUTFILE outfile.html HOSTNAME "[TechNet of Tennessee, Inc.]" # REQINCLUDE pages REQLINKINCLUDE pages ... and so on. Any help would be greatly appreciated. - Carter This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/