MARKIEWICZ_PETER
Thu, 21 Oct 1999 12:15:40 -0700
> Hi: > > We're interested in analyzing logged queries taken from our htdig > (http://www.htdig.com) search engine. Since htdig uses a POST command for > its searches it doesn't write a query string into the main web log. > > However...by turning logging on in the htdig configuration file, and > adjusting syslog.config the program can be made to write each search > request in a format similar to other access logs. The description of this > option in the htdig documentation suggests that Analog could be configured > to read the logformat. > > Two sample entries are shown below: > > Oct 15 16:22:12 fire htsearch[2383]: 209.44.55.221 [htdig] (all) [adam > blue] [(adam or adams) or (blue or blued)] (358/10) - 1 -- > http://kspace.com/cgi-bin/htsearch > > Oct 15 16:22:44 fire htsearch[2343]: 209.44.55.221 [htdig] (all) [adam > blue] [(adam or adams) or (blue or blued)] (358/10) - 1 -- > http://kspace.com/cgi-bin/htsearch?config=htdig&restrict=&exclude=&words=a > dam+blue > > Note that the log has two entry formats -- one with, one showing the query > terms alone and another showing query plus a pseudo GET string. The latter > format only appears if the user re-searches after running an initial > search. > > We tried to use the LOGFORMAT command to read this file with analog, using > the following format: > > LOGFORMAT (%M %d:%h:%j %j htsearch[%j] %S [htdig] %j [%q] [%j] %j - %j -- > %r) > > Initially, we ran into the following errors: > > Warning C: Bad argument in configuration command: ignoring it > > ..cont...: (reason: time without date or vice versa) > > This could be fixed by making all the date/times %j (since part is missing > in htdig format). However even when all date references are removed: > > LOGFORMAT (%j %j %j j% htsearch[%j] %S [htdig] %j [%q] [%j] %j - %j -- > %r) > or > LOGFORMAT (%j %j %j j% htsearch[%j] %S [htdig] %j [%q %j] [%j] %j - %j -- > %r) > > The program generates the following errors: > > Warning M: Logfile /usr/local/etc/httpd/logs/htdig.log conntains lines > with no bytes: byte counts may be low > > Warning L: Large number of corrupt lines in logfile > /usr/local/etc/httpd/logs/htdig.log: try different LOGFORMAT > > > This is apparently due to the fact that the [%q] term and [j%] term just > after it have spaces -- since htdig writes the query keywords and > variations without padding. Since the number of keywords+spaces and > keyword permutations+spaces are variable there's no way to account for the > unpadded keyword area. > > Can anyone suggest a workaround? By manual inspection we've found that our > local htdig logs provide extremely interesting information -- essentially > queries from users who are within our site but are confused -- as opposed > to the query terms that originally brought them there in the main log. > > ------------------------------------------------------------------------ This is the analog-help mailing list. To unsubscribe from this mailing list, send mail to [EMAIL PROTECTED] with "unsubscribe analog-help" in the main BODY OF THE MESSAGE. List archived at http://www.mail-archive.com/analog-help@lists.isite.net/ ------------------------------------------------------------------------