On Tue, 23 Nov 1999, you wrote:
>
>It would be really nice if there was a way to have analog output a
>list of referrers with query strings that are NOT covered by the
>current set of SEARCHENGINE commands -- then one could look at that
>list every week or so and go track down the new ones.
>

This is useful, but I think it would be beyond the scope of analog. 

The current system of SEARCHENGINE commands is flawed in my opinion, but only
because search engines are updated all the time. So we're always going to have
to go in and change these entries, following sites like
http://www.searchenginewatch.com for updates on a regular basis.

Perhaps there is some kind of pattern to how the log file entries for search
engine referrals work though. Most of the variable names of cgis powering search
engines are called either query or something similar, and most of the other
variables in a string have numeric values. So maybe some kind of generic way of
determining what the keywords are can be worked out, although it would be error
prone. How about if analog wrote possible search engine strings and what it
thought the keyword variable was to a file, like for the dns lookups?

Meanwhile though, you can do it with basic unix tools like cut grep and uniq
(assuming you have unix?): 

Go to your log file directory and type:

zcat `find .  -name "*.gz" -mtime -14` |grep "query\?" |cut -f11 -d " "

This will list any file in that directory with extention ".gz" created in the
last 2 weeks, then it will filter this list and give only the ones with the
string "query?" in them. Finally it strips all the fields in the lines but the
eleventh, which is the referrer field in my case.

It's not infallible, but it picks up a lot of sites, and it doesn't cough when
it finds a lot of double quotes around the search terms, which analog does
sometimes.

Ale

--
Alejandro Fernandez, ------------ "Virtual Communities made real"
Webmaster Sift Plc.,  The Mill House, Redcliff Backs, Bristol BS1 6LY
e: [EMAIL PROTECTED]   t: ++44 117 915 9600  http://www.sift.co.uk
------------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/[email protected]/
------------------------------------------------------------------------

Reply via email to