You can use the REFEXCLUDE line to remove all cache entries from your
reports if you don't want them at all. Something like this should
work:

  REFEXCLUDE http://*google*/*q=cache*

If you want to keep the requests (because they are valid, even if
there appear to be a lot of them) but don't want them to show up in
the search word report you could either change them with a BROWALILAS
command so they don't match the SEARCHENGINE command or change the
Google SEARCHENGINE command to be more particular (I think using a
REGEXP, if it's really perl-compatible, you can do a zero-width
negative look-ahead assertion to make sure that the query isn't a
cache: item.)

-- 

Jeremy Wadsack
Wadsack-Allen Digital Group


Tobias Stefan Richter ([EMAIL PROTECTED]; Friday, August 22, 2003 8:13 AM):

> Hi,

> when people browse Google's cached version of a web page of 
> mine or use the 'what's related' feature, they leave lines
> like these in my log file:

> 0.129.141.165 - - [15/Aug/2003:01:26:45 +0200] "GET / HTTP/1.1" 200 8630
> "http://www.google.de/search?hl=de&lr=lang_de&ie=UTF-8&oe=UTF-8&q=related:www.physik.tu-berlin.de/~tallera/gb/gaestebuch.phtml";
>  "Mozilla/4.0"

> 0.27.136.10 - - [19/Aug/2003:14:52:46 +0200] "GET / HTTP/1.1" 304 -
> "http://www.google.com.tr/search?q=cache:_DSfUGirLE8J:atom.physik.tu-berlin.de/pub2002.0.html+Photoionisation+studies+of+the+2p+resonances+of+atomic+Calcium&hl=tr&ie=UTF-8";
>  "Mozilla/4.0"

> They enter the Search Query Report and the Search Word Report.
> Using a quite standard config file with analog 5.23 (also
> tried 5.32) I'm not quite happy with the outcome for the 
> following reaons:

> - the related entry should not appear in the Search Word Report

> - the cache:_xxx[URL] part should be suppressed completely in 
>   both reports

> - unlike real searches, where one sees a single Referer line
>   per client (for the main document), a "cache:_xxx[URL]" containing 
>   line is seen for every image referenced by the main document
>   (which itself is not fetched).
>   by that 'cache' search queries and words apprear exaggeratedly popular

> If yet no good idea how to fix this. 
> At least I doubt it can be easily configured in analog, or did I miss
> something?

> In the mean time, while the first two issue aren't solved, maybe 
> it's easier to do that one: For the Search Word Report the URL part
> of the 'cache:' or 'related:' expressions is split into two parts 
> due to the hyphen "-" in it, so I get "cache:_dsfugirle8j:atom.physik.tu" 
> as a popular search word and "berlin.de/pub2002.0.html" aswell.
> Why does analog split at the hyphen at all?

> Bye,
> Tobias

+------------------------------------------------------------------------
|  TO UNSUBSCRIBE from this list:
|    http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+------------------------------------------------------------------------

Reply via email to