analog-help  

[analog-help] FW: Making Analog LOGFORMAT work with htdig logs

MARKIEWICZ_PETER
Thu, 21 Oct 1999 12:15:40 -0700



> Hi:
> 
> We're interested in analyzing logged queries taken from our htdig
> (http://www.htdig.com) search engine. Since htdig uses a POST command for
> its searches it doesn't write a query string into the main web log. 
> 
> However...by turning logging on in the htdig configuration file, and
> adjusting syslog.config the program can be made to write each search
> request in a format similar to other access logs. The description of this
> option in the htdig documentation suggests that Analog could be configured
> to read the logformat.
> 
> Two sample entries are shown below:
> 
> Oct 15 16:22:12 fire htsearch[2383]: 209.44.55.221 [htdig] (all) [adam
> blue] [(adam or adams) or (blue or blued)] (358/10) - 1 --
> http://kspace.com/cgi-bin/htsearch
> 
> Oct 15 16:22:44 fire htsearch[2343]: 209.44.55.221 [htdig] (all) [adam
> blue] [(adam or adams) or (blue or blued)] (358/10) - 1 --
> http://kspace.com/cgi-bin/htsearch?config=htdig&restrict=&exclude=&words=a
> dam+blue
> 
> Note that the log has two entry formats -- one with, one showing the query
> terms alone and another showing query plus a pseudo GET string. The latter
> format only appears if the user re-searches after running an initial
> search.
> 
> We tried to use the LOGFORMAT command to read this file with analog, using
> the following format:
> 
> LOGFORMAT (%M %d:%h:%j %j htsearch[%j] %S [htdig] %j [%q] [%j] %j - %j --
> %r)
> 
> Initially, we ran into the following errors:
> 
> Warning C: Bad argument in configuration command: ignoring it
> 
> ..cont...: (reason: time without date or vice versa)
> 
> This could be fixed by making all the date/times %j (since part is missing
> in htdig format). However even when all date references are removed:
> 
> LOGFORMAT (%j %j %j  j% htsearch[%j] %S [htdig] %j [%q] [%j] %j - %j --
> %r)
> or
> LOGFORMAT (%j %j %j  j% htsearch[%j] %S [htdig] %j [%q %j] [%j] %j - %j --
> %r)
> 
> The program generates the following errors:
> 
> Warning M: Logfile /usr/local/etc/httpd/logs/htdig.log conntains lines
> with no bytes: byte counts may be low
> 
> Warning L: Large number of corrupt lines in logfile
> /usr/local/etc/httpd/logs/htdig.log: try different LOGFORMAT
> 
> 
> This is apparently due to the fact that the [%q] term and [j%] term just
> after it have spaces -- since htdig writes the query keywords and
> variations without padding. Since the number of keywords+spaces and
> keyword permutations+spaces are variable there's no way to account for the
> unpadded keyword area.
> 
> Can anyone suggest a workaround? By manual inspection we've found that our
> local htdig logs provide extremely interesting information -- essentially
> queries from users who are within our site but are confused -- as opposed
> to the query terms that originally brought them there in the main log. 
> 
> 
------------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/analog-help@lists.isite.net/
------------------------------------------------------------------------